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To Galina 




Preface 



Cryptology is nowadays one of the most important subjects of applied mathe- 
matics. Not only the task of keeping information secret is important, but also 
the problems of integrity and of authenticity, i.e., one wants to avoid that an 
adversary can change the message into a fraudulent one without the receiver 
noticing it, and on the other hand the receiver of a message should be able 
to be sure that the latter has really been sent by the authorized person (elec- 
tronic signature). A big impetus on modern cryptology was the invention of 
so-called public-key cryptosystems in the 1970’s by Diffie, Heilman, Rivest, 
Shamir, Adleman, and others. In particular in this context, deep methods 
from number theory and algebra began to play a decisive role. This aspect of 
cryptology is explained in, for example, the monograph “Algebraic Aspects 
of Cryptography” by Koblitz (1999). The goal of these notes was to write a 
treatment focusing rather on the stochastic (i.e., probabilistic and statistical) 
aspects of cryptology. As this direction also consists of a huge literature, only 
some glimpses can be given, and by no means are we always at the frontier 
of the current research. The book is rather intended as an invitation for stu- 
dents, researchers, and practitioners to study certain subjects further. We 
have tried to be as self-contained as reasonably possible, however we suppose 
that the reader is familiar with some fundamental notions of probability and 
statistics. It is our hope that we have been able to communicate the fascina- 
tion of the subject and we would be delighted if the book encouraged further 
theoretical and practical research. 

Let me give my gratitude to my colleagues in the Cryptology Section in the 
Ministry of Defense of Switzerland for the excellent and stimulating work- 
ing atmosphere. Many thanks are also due to Werner Schindler from the 
German “Bundesamt fiir Sicherheit in der Informationstechnik” for helpful 
discussions. Furthermore, I am indebted to Springer- Verlag, Heidelberg for 
the agreeable cooperation. However, the most important thanks goes to my 
wife Galina for her constant moral support of my scientific activities. Without 
her asking “How is your book?” from time to time, the latter would certainly 
not yet be finished! 



Bern, February 2004 



Daniel Neuenschwander 
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Introduction 



Background 

Cryptology is nowadays considered as one of the most important fields of 
applied mathematics. Also, aspects from physics and, of course, engineering 
science play important roles. Classical cryptology consisted almost entirely 
of the problem of secret keeping. The so-called “Caesar shift code” was just 
a shift of the alphabet by a certain number of places, e.g., 3 places (then the 
plaintextletter “a” was encrypted by the dphertextletter “D”, “b” by “E”, 
etc., “w” by “Z”, and then “x” by “A”, “y” by “B”, “z” by “C”). Such a shift 
code is, of course, trivial to decrypt^, because one needs to try only 25 pos- 
sibilities with some groups of subsequent ciphertextletters until one obtains 
some meaningful plaintext. More general are monoalphabetic substitutions, 
which are just any permutation of the alphabet. Here, one has 26! — 1 « 4-10^® 
possibilities, but as the same plaintextletter always corresponds to the same 
dphertextletter and vice versa, frequent letters (or pairs/triples of letters) in 
the ciphertext will with great probability correspond to frequently occurring 
letters (pairs/triples) in the language in which the plaintext is written, for 
example the letter “e” in German. For example, the following features of Ger- 
man language support the decryption of monoalphabetic encryptions: If in 
the ciphertext a triple of consecutive letters occurs several times, then there is 
a good chance that it corresponds to the plaintext triple “sch” ; the plaintext 
letter “c” is almost always succeeded by “h” or “k”, “q” by “u” with hardly 
any exceptions. In any language (and also with more general cryptosystems) 
the encryptor should avoid the use of “mots probables” (words from which 
an adversary can conjecture that they appear in the plaintext, e.g., military 
terms, “Heil Hitler”, etc.). During the Second World War, this danger was 
often neglected, a mistake that was not the most important, but one of sev- 
eral reasons why enemy codes were decrypted in a decisive measure at that 
time. In recent years, many documents have been (and still are) found by 
historians in archives which confirm this fact. In the year 1586, the French 
diplomat Blaise de Vigenere (1523-1596) found a polyalphabetic code that 

^ In all our subsequent text, the word “decipher” will mean the decoding of a 
ciphertext by its legitimate receiver, whereas “decrypt” will mean the breaking 
of the code by an adversary. 

D. Neuenschwander: Prob. and Stat. Methods in Cryptology, LNCS 3028, pp. 1-7, 2004. 
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was thought to be “unbreakable” for centuries. This code will be presented 
in Section 1.1 of our text, together with the attacks on it found not earlier 
than in the second half of the 19th and at the beginning of the 20th century. 
After the spectacular successes in decrypting rotor enciphering machines such 
as ENIGMA, etc., during the Second World War, in the second half of the 
1970s a great impetus on the development of modern cryptology was given 
by the invention of so-called public-key cryptosystems, in particular the code 
that is now known under the name “RSA system” (named after the au- 
thors who published it, namely “R” for Rivest, “S” for Shamir, and “A” for 
Adleman). Its detailed working is described in Section 2.1. The only non- 
trivial ingredient is Fermat’s Little Theorem, which was known as a piece 
of “pure” number theory long before. It turned out since then that number 
theory and algebra are of decisive importance in modern cryptology, both in 
cryptography and cryptanalysis, in contrast to the assertion of the English 
mathematician G. Hardy (1877-1947) that by analyzing primes one “can not 
win wars” ! 

Nowadays, not only (classical) algebra and number theory, but also many 
other fields of mathematics, such as highly advanced topics of algebra and 
number theory (such as, for example, modern algebraic geometry, elliptic 
curves), graph theory, finite geometry (see, for example, Walther (1999)), 
probability, statistics, etc., play a role in cryptography, not to mention the re- 
cent (at least theoretical) developments in quantum computing and quantum 
cryptography (based on quantum mechanics) and all questions on hardware 
implementation of cryptosystems. 

Furthermore, other goals entered into cryptology, namely the task of securiza- 
tion of the integrity and authenticity of a message. This means that (even for 
a possibly open transmission channel) one wants to avoid the message being 
changed by some unauthorized person without the receiver noticing it, and, 
on the other hand, the receiver wants to be sure that really the authorized 
person was the sender of the message (electronic signature). (In this context, 
we also mention the (however, already old) concept of steganography, where 
even the mere fact that a message has been transmitted (not only its con- 
tents) is to be kept secret. We will not discuss this subject further.) On the 
other hand, generalizations to multiparty systems also emerged. Nowadays, 
network security is a very important problem in practice. 

A systematic introduction to the algebraic and number theoretic aspects was 
given in the Koblitz (1999) book “Algebraic Aspects of Gryptography” . The 
goal of our text will be to give a similar insight into some probabilistic and 
statistical methods (in its broadest sense, so, for example, also using quan- 
tum stochastics) of cryptology. By no means do we claim completeness, only 
some introductions to certain topics can be given. Important areas, such as for 
example secret sharing, multi-party systems, zero-knowledge, problems on in- 
formation transmission channels, linear cryptanalysis, digital fingerprinting, 
visual cryptography (see, for example, de Bonis, de Santis (2001)), etc., had to 
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be (almost) entirely excluded. For further reading, we recommend that read- 
ers consult, in particular, the Journal of Cryptology and the various confer- 
ence proceedings series, e.g., in the Springer Lecture Notes in Computer Sci- 
ence (EUROCRYPT, CRYPTO, ASIACRYPT, AUSCRYPT, INDOCRYPT, 
FAST SOFTWARE ENCRYPTION, etc.). What is also of interest are the 
journals Designs, Codes, and Cryptography, and IEEE Transactions on In- 
formation Theory, together with several “computational” periodicals. Some- 
times, very important information can also be found in mathematical and 
stochastic journals/books, though this is rather the exception compared to 
the specific series devoted more to what is nowadays called “Theoretical Com- 
puter Science”. 



Book Structure 

Let us now give a short description of the contents of the present book. 

As already mentioned, in Section 1.1 we present the famous classical Vigenere 
system, which for a long time was believed to be as “secure as possible” . Of 
course, no cryptosystem is absolutely secure in the literal sense of the word, 
since there is always the possibility of exhaustive search (in many cases, even 
though no better attack is known, however, also no proof that no better attack 
exists is available up to now). (Somewhat exceptional is quantum cryptogra- 
phy as it is briefly described in Chapter 13. But this is research in progress.) 
So actually the mere reasonable definition of “security” of a cryptosystem 
is a non-trivial task. In Section 1.2 we speak about the most natural (but 
expensive to realize) notion of “perfect secrecy”, whereas other security con- 
cepts (weaker, but often more easily implementable and testable ones) are 
discussed in Sections 5.1 (Golomb’s conditions, PN-sequences), 5.3 (“perfect 
pseudo-randomness” , which means that a source cannot “efficiently” be dis- 
tinguished from a truly random source), 5.4 ((“almost”) ideal local statistics). 
Chapter 10 (“semantic security”, which is a “polynomially bounded” version 
of perfect secrecy in the sense that one assumes that the adversary has only 
“polynomial” computational resources), and Chapter 11 (“algorithmic com- 
plexity”). Of course, theoretically quite weak but in practice not unimportant 
is the requirement for maximal linear complexity (see Sections 5.1 and 7.11), 
if one confines oneself to linear feedback shift registers. A short remark fol- 
lows about a misleading “intuitive” idea concerning cascade ciphers, against 
which Massey and Maurer (1993) warned in their paper “Cascade Ciphers: 
The Importance of Being First” . 

Chapter 2 is devoted to public-key ciphers, in particular to the RSA system. 
After the introduction of the RSA system, whose basis is the (probably true 
and therefore generally supposed) computational difficulty of factoring large 
integers, we present two of the best-known probabilistic primality tests (the 
Soloway-Strassen test, which, loosely speaking, tests Euler’s criterion for the 
Legendre- Jacobi symbol, and the Rabin test, which is related to Fermat’s 
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Little Theorem for residue rings modulo a prime) . A specially designed prob- 
abilistic prime number test for numbers congruent 3 (mod. 4) (i.e., candidates 
for prime factors of so-called Blum integers) has been presented by Muller 
(2003). In Section 2.4 we prove that in the RSA system, one has a “hard” 
least significant bit, which means that if ever one finds a probabilistic poly- 
nomial time algorithm for calculating the least significant bit of the plaintext 
from the public key and the ciphertext, then there exists also a probabilis- 
tic polynomial-time algorithm for reconstructing the whole plaintext from 
these data. “Hard bits” have been the subject of much subsequent literature. 
Another public-key algorithm, the Diffie-Hellman system, will be discussed 
in Chapter 8. Section 2.5 warns against careless hardware implementation, 
so that certain internal parameters (e.g., processing time) can be measured 
by the adversary, and advises on avoiding such attacks. For further reading 
about the subject of “timing attacks”, we also refer to Schindler (2002a). In 
Section 2.6 we show how somebody can persuade his/her friend that he/she 
has found an RSA-secret key of somebody else without revealing any infor- 
mation about it, thus giving a first glimpse into the field of zero-knowledge 
proofs. 

Chapter 3 presents Shor’s algorithm (for whose invention Shor got the Nevan- 
linna prize) for factoring numbers with quantum computers. One must admit 
that up to now, quantum computers have been rather a theoretical concept 
and not yet producible in a usable way. The latest news about hardware re- 
search in this direction is rather pessimistic. Of course, from the viewpoint of 
users of classical cryptological devices this is reassuring, for if an adversary 
were really in possession of a quantum computer working on a large scale, 
then virtually all cryptosystems whose security is based on the “intractabil- 
ity” of the problem of factorizing numbers or the discrete logarithm problem 
would be breakable in “no” time (more precisely: in linear time, where up 
to now only behavior (e.g., for the quadratic or the number field sieve) of 
an order little better than exponential is known). We do not assume that 
the reader has any preliminary knowledge of quantum theory. All necessary 
explanations are given in Section 3.2. Shor’s algorithm makes use of a result 
from the theory of continued fractions, which we will present in Section 3.3. 
Almost all cryptosystems work with keys, which, as a doctrine (at least in 
theoretical cryptology), is the only information on the cryptosystem that is 
assumed to (and can realistically) be kept secret. That is, one always as- 
sumes, in order to be on the safe side, that the adversary is in possession of 
the device that has been used for encryption/deciphering, but he has virtu- 
ally no information about the key. The most secure way to provide a good key 
is to generate it with a genuine, physical generator, e.g., radioactive sources 
with Geiger counters or electronic noise produced by a semiconducting diode 
(see Chapter 4). For general use, for example, HOT BITS is a source of ran- 
dom bits stemming from beta radiation from the decay of krypton-85, and 
is available on the Internet. However, physical devices are very slow com- 
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pared to pseudo-random generators, which we will treat in Chapter 5. Some 
considerations about possible constructions of good physical random number 
generators, such as some discussions on their quality due to Zeuner and the 
author, are the subject of Section 4.2. In Section 4.3 we address the general 
problem of obtaining random bits that are as unbiased as possible, if the 
disposable source only produces random bits with a certain bias. We will cal- 
culate the “extraction rate” (which indicates in some sense the asymptotical 
speed of the diminution of the bias per new random bit source, when the fi- 
nal output bit is produced by adding (mod. 2) independent biased random bit 
sources) for rational biases. Interestingly enough, the extraction rate turns 
out to be independent of the size of the bias b, but to be determined solely 
by the arithmetic properties of b. However, one finds that the extraction rate 
is 0 for Lebesgue-almost all biases b. 

On the contrary, we speak about pseudo-random generators in the follow- 
ing. In Chapter 5, we present some important examples (linear feedback 
shift registers (Section 5.1) and combinations thereof (Section 5.5), non-linear 
feedback shift registers (Section 5.4), shrinking and self-shrinking generators 
(Section 5.2), and the quadratic congruential generator (Section 5.6)). 
Chapter 6 is a brief introduction to the most important notions of infor- 
mation theory as it is of use for us and to the aforementioned problem of 
authenticity. Section 6.3 is a new unorthodox approach. 

In Chapter 7 we give a collection of some of the best-known tests for pseudo- 
random-number generators, orienting ourselves to a great extent at the tests 
suggested by Rukhin (2000a,b) and the test-battery used for evaluation of 
the AES. As is well-known, for a long time, the block cipher “data encryp- 
tion standard” (DES) has been widely used, but, by using parallelism, it has 
been possible to break it. Then the NIST (National Institute of Standards 
and Technology) invited the worldwide cryptologic community to develop an 
“advanced encryption standard” (AES). The winner of this contest was the 
algorithm RIJNDAEL designed by Rijmen and Daemen. 

Chapter 8 discusses the distribution of keys in the Diflie-Hellman public-key 
system. In this context, the notion of “strong primes” (primes p that are of 
the form p = 2q + I (where g is a prime)) is useful. Namely, it turns out 
that if the modulus is a strong prime, then the entropy of the Diflie-Hellman 
key is nearly the maximum possible, which means that it is recommendable 
to use strong primes as moduli. Similar considerations about bit security as 
we have in Section 2.4 apply for the Diflie-Hellman system, too. We refer to 
Gonzalez Vasco, Shparlinski (2001). 

Chapter 9 describes an attack on block ciphers that has become very popu- 
lar in recent years, namely differential cryptanalysis. Roughly speaking, here 
the cryptanalyst makes use of cases where “differences/sums” (in the alge- 
braic sense) of pairs of plaintexts leak through to differences/sums of the 
corresponding pairs of ciphertexts. In an iterative r-round block cipher, with 
this method it is sometimes possible to guess the r-th round subkey, then the 
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(r— l)-th round subkey, etc., iteratively until the whole key is found. Interest- 
ingly enough, although the theoretical results are generally proved under the 
assumption that the round keys are chosen as i.i.d. (independent and iden- 
tically distributed), in practice they are experimentally verified (sometimes 
with even better behavior) if some key schedule algorithm is used. Section 
9.2 generalizes distributional results for so-called characteristics (i.e., pairs 
of differences of plaintext / ciphertext pairs of bitstrings) due to Hawkes and 
O’Connor to residue rings of arbitrary modulus. Matsui (1994) developed 
the related concept of linear cryptanalysis, which we have excluded from our 
presentation. 

In Chapter 10 we deal with semantic security. Roughly speaking, semantic 
security is a polynomially bounded variant of perfect security, i.e., one as- 
sumes that the adversary has only polynomially bounded resources. 

A notion of “algorithmic complexity” (the so-called “Turing-Kolmogorov- 
Chaitin complexity” , which is — roughly speaking — the length of the short- 
est program that one must feed to a universal Turing machine to generate 
as output a given bitstring) is considered in Chapter 11. However, this is of 
rather theoretical interest, since the algorithmic complexity of a given bit- 
string is not computable (in the sense of the Church Thesis). It turns out 
that in the sense of the Haar measure, for almost all bitstrings the algo- 
rithmic complexity is equal to the linear complexity, thus here we have a 
somewhat similar situation as for the extraction rate of biases in Section 4.3. 
At first glance this contradicts the fact that there are very simply constructed 
bitsequences with maximal linear complexity (e.g., 00. ..01), but the above- 
mentioned equivalence is not valid for “effectively constructible” sequences 
(see the title of the paper of Beth and Dai (1990): “If you can describe a 
sequence, it can’t be random.”). 

Chapter 12 addresses the problem of collisions and the related “meet-in-the- 
middle” attack, which has to do with the well-known birthday paradox from 
probability theory. 

Finally, we give a short glimpse into quantum cryptography in Chapter 13. 
In this situation, the receiver of an encrypted message will immediately de- 
tect (with arbitrarily large probability) if an adversary has manipulated the 
message (maybe even only “measured” it in the quantum-mechanical sense), 
which in general is of course not the case in classical cryptosystems. However, 
here also, the technology has not yet been developed far enough. Note that 
Chapter 13 deals with “genuine” quantum cryptography, whereas in Chapter 
3 we showed how to solve a problem of classical cryptography by means of 
quantum computing. 

Finally, a word about giving proper credits should be said: In cryptology, 
it is even more difficult than in other sciences to know to whom a certain 
result should really be attributed, since often methods that have been pub- 
lished later have already been developed (at least to a certain extent) before 
by cryptologists who were not allowed to publish their findings, especially 
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during the time of the Second World War and the Cold War. So, citations 
of literature in our text should hardly be interpreted as a reference giving 
a credit to a certain person or group of persons. For example, one sees few 
Russian names occurring in the cryptological literature however, it turned 
out that Soviet cryptanalysts have had important successes in, for example, 
cryptanalysis, too. 

In the body of this book, we give few formal citations, in order not to inter- 
rupt the smoothness of the presentation too much. Instead, we have included 
a section “Bibliographical Remarks” at the end of the text. 

Chapters and sections with an asterisk treat more specific subjects and can 
be omitted at first reading. 



About Notation and Terminology 

Throughout the book, the symbol IB will denote GF{2) = Z 2 , the field with 
the two elements 0 and 1, which will be called “bits” (exception: Section 4.3). 
Also, for a sequence x = {xi,X 2 , ■ ■ .), the symbol will mean the finite 
subsequence consisting of the first n elements: x^^'> = (xi, X 2 , . . . , x„). The 
indicator function of the set B will be written as 1(R)(.). 

“W.l.o.g.” means “without loss of generality”. The shorthands “i.i.d.” and 
“a.s.” stand for the probabilistic notions “independent and identically dis- 
tributed” and “almost surely” (i.e., “with probability one”). As already men- 
tioned in the footnote at the beginning, the word “decipher” will mean the 
decoding of a ciphertext by its legitimate receiver, whereas “decrypt” is the 
breaking of the code by an adversary. 




1 Classical Polyalphabetic Substitution 
Ciphers 



1.1 The Vigenere Cipher 

The classical situation in cryptology, which we will consider below, is the 
following: There are two parties, A (called ’’Alice” in the jargon) and B (called 
’’Bob”). Alice would like to send a message to Bob by some channel. But 
this channel is unsecure because in-between the two, there is some adversary 
(’’enemy”, eavesdropper) E (called ”Eve”) who either wants 

- to listen in on the message sent from A to B and/or 

- to send a message herself to B, asserting that this message comes from A 
and/or 

- to change a message indeed sent by A to B. 

All these three attacks should be avoided. The first attack (listening in) con- 
cerns the problem of secrecy (or confidentiality), the second that of authen- 
ticity, and the third that of integrity. In other words, there are two inde- 
pendent goals: To reach secrecy resp. authentic! ty/integrity, the output resp. 
input of the channel from A to B should be exclusive. Of course, there are 
more general cryptologic situations (multi-party models, secret sharing, zero- 
knowledge, etc.). But these will not be considered here (except in the short 
Section 2.6). Also the integrity/ authenticity problem will only be addressed 
in Sections 2.1 (electronic RSA signature) and 6.2 (impersonation attack), 
and Chapter 12 (meet-in-the-middle attack). Apart from that, in this intro- 
ductory text we will mainly be concerned with secret keeping. 

In this chapter, we will present a classical cryptosystem, the so-called Vi- 
genere cipher, invented in 1586 by the French diplomat Blaise de Vigenere 
(1523-1596). It belongs to the class of polyalphabetic cryptosystems, which 
means that the same letter of plaintext is not always encoded by the same 
letter of ciphertext. This fact is of great importance in general. If a cryptosys- 
tem is monoalphabetic, i.e. if every letter of plaintext is always encrypted by 
the same letter of ciphertext, then statistical properties of the letters of the 
language in which the plaintext is written automatically leak through to the 
ciphertext, i.e. (for long enough messages) frequent letters (or m-grams) in 
the ciphertext correspond to frequent letters (or m-grams) in the plaintext, 
and by some statistical analysis it is, in general, not too difficult to find the 



D. Neuenschwander: Prob. and Stat. Methods in Cryptology, LNCS 3028, pp. 9-15, 2004. 
© Springer- Verlag Berlin Heidelberg 2004 




10 



1 Classical Polyalphabetic Substitution Ciphers 



plain-/ciphertext correspondence of frequent letters (m-grams) of the lan- 
guage. To fill in the rest, often some ’’trial and error” helps (in particular 
with some additional information about ’’mots probables” (words that are 
likely to occur in the message)). 

The Vigenere system is very simple and works as follows: Given a keyword, 
e.g., ’’PEACE” and the plaintext 

OSAMABINLADEN, 

then one writes the plaintext and the repeated keyword under each other and 
’’adds” the corresponding letters mod. 26 (where A is interpreted as 0, B as 
1, etc.) to obtain the ciphertext: 



Plaintext 


0 


S 


A 


M 


A 


B 


I 


N 


L 


A 


D 


E 


N 


Keyword 


P 


E 


A 


C 


E 


P 


E 


A 


C 


E 


P 


E 


A 


Ciphertext 


D 


W 


A 


0 


E 


Q 


M 


N 


N 


E 


S' 


T 


N 



If Bob knows the key word, he can retrieve the plaintext from the ciphertext 
simply by subtracting the corresponding letters of the keyword mod. 26. But 
what cryptanalysis is concerned, one must say that although this system is 
polyalphabetic as such, always after k places (if k is the length of the keyword) 
the same substituting alphabet (which is even just a shift of the original 
alphabet in the sense of its interpretation as elements of .^ 25 ) is used. This 
gives rise to an algebraic method (the so-called Kasiski test) of determining 
the keyword length up to multiples. Together with the stochastic Friedman 
test, which yields the order of magnitude of the length of the keyword, one 
can determine in most cases the actual length of the keyword. If this is known, 
for every place modulo the length of the keyword, one must replace the letter 
of the ciphertext that occurs most frequently by some very frequent letter 
of the language in which the plaintext is written to determine the shift, 
and then with little routine work one can then (in general) reconstruct the 
plaintext thus. Let us describe the details: The Kasiski test is named after 
the Prussian major Friedrich Wilhelm Kasiski (1805-1881), although it had 
been found nine years before him (but had not been published) by Charles 
Babbage (1792-1871) in 1854. It rests on the following observation: If a certain 
word (for example a preposition or a conjunction, etc.) occurs several times 
in the plaintext and if by chance (which is often quite large) the distance 
between two such occurrences of the same word is a multiple of the length of 
the keyword, then this word is encoded both times by the same sequence of 
letters in the ciphertext. Or - spoken the other way round - if one detects the 
same subsequences of letters (maybe even short ones, e.g., of length 3) several 
times in the ciphertext, then the distance between them is quite probably a 
multiple of the keyword length. Now the second part will be a little more 
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involved, it is the so-called Friedman test, which was developed by William 
Friedman in 1925. This is a test zhat is of stochastic nature. Consider a 
plaintext of n letters, built from the Latin alphabet with the 26 characters 
”A”, ”B”,. . .. Let ni be the number of ”A”s, U 2 the number of ”B”s, etc. in 
the plaintext (hence n = Then the index of coincidence I is defined 

as the probability that an arbitrary pair of letters taken from the plaintext 
consists of the same 2 letters, i.e. 

n{n — 1) 

If Pi denotes the probability that on some fixed place (in a text of the con- 
sidered language) letter i occurs, then (if the text is long enough) we have 

26 

(T1) 



The expression on the right-hand side of (1.1) decreases, if the distribution 
of the letters in the language becomes more regular and takes its minimum 
0.0385 if Pi = 1/26 for all i G {1, 2, . . . , 26}. The index of coincidence of 
a natural language typically has about the double value (e.g. about 0.0667 
for English). With a monoalphabetic substitution, the index of coincidence 
remains unchanged whereas it decreases (in general) with a polyalphabetic 
substitution. So a coincidence index of a polyalphabetic substitution tends 
to be low (near 0.0385), whereas a significantly higher value suggests that 
a monoalphabetic substitution method has been used. Now I (from the ci- 
phertext) can be used to determine the approximate length of the keyword as 
follows: Assume the keyword has length i (and, for simplicity, that n is w.l.o.g. 
a multiple of t). Then write a {{n/£) x £)-matrix M where the letters number 
k + ji {j = 0,1,2,..., {n/£) — 1) of the ciphertext form the fc-th column. Now 
if we take a (random) pair of letters in some fixed column, the probability 
that both letters are equal is about (in practice a little more than) 0.0667, 
since the individual columns have been encrypted monoalphabetically. The 
number of pairs of two letters of the same column is given by n{(n/£) — l)/2. 
If we take random pairs of letters of two different columns, the probability 
of obtaining the same letter twice is about 0.0385 (if the keyword is ’’long” 
and ’’random” enough). The number of pairs from two different columns is 
n{n — {n/£))/{2£). Hence the probability p to have equal letters if one takes 
a pair of two letters from the matrix M at random is about 



• 0.0667 

1 

£/{n — 1) 



■ 0.0385 

1)72 



(0.0282n -k £(0.0385n - 0.0667)). 
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Since this expression is an approximation for I from the ciphertext, we may 
replace phy I from the ciphertext and by solving with respect to £ we obtain 
Friedman’s formula for the approximate keyword length £: 

( 12 ) 

(n- l)/- 0.0385n + 0.0667’ ^ 

where I is the empirical coincidence index of the ciphertext. 



1.2 The One Time Pad, Perfect Secrecy, and Cascade 
Ciphers 

The method of attack described in the foregoing section becomes more and 
more difficult if the keyword becomes longer and longer and is ’’random 
enough”. If, as a keyword, one takes a random string of the same length 
as the plaintext itself, then the ciphertext becomes a random string, too, 
and thus the system is theoretically (or ’’perfectly”) secret (or ’’secure”). 
This system is called the One-Time Pad and was invented in 1917 by G. S. 
Vernam (1890-1960) (that is why it is also called the ’’Vernam cipher”). But 
what is the practicability of it, if the key (which has also to be transferred 
once from Alice to Bob) must have the same length as the plaintext? Do 
we really gain something? The anwer is yes, for the key can be exchanged 
at any time before the transmission of the message becomes necessary, e.g. 
by some trustworthy courier. But it is important that any key is used only 
once (and then destroyed), for if two messages xiX 2 ■ ■ - Xn and x'ix '2 ■ ■ - x'^ 
have been encrypted by the key z\Z 2 - ■ - Zn to give the ciphertexts 2 / 12/2 J/m 
resp. 2 / 12/2 2/n> then yi + y'i = Xi + x'^. So immediately the sum of the two 

plaintexts is already known, which reveals a lot of information! 

Let us discuss the notion of perfect secrecy in some more detail. 

Definition 1.1. A eryptosystem is said to have perfect secrecy if for all plain- 
texts X and all ciphertexts Y, we have 

P{X\Y) = P{X). 

Generally, perfectly secret cryptosystems can be characterized as follows: 

Theorem 1.1. Assume P{X) > 0 for any plaintext X and assume that the 
key space has the same size as the space of possible ciphertexts. Then a cryp- 
tosystem has perfect secrecy iff the distribution over the key space is uniform 
and if for any plaintext X and any ciphertext Y there is exactly one key Z 
that encrypts X to Y. 

Proof: 1. We first prove the ’’only if’-direction. Let X denote a plaintext 
and assume there is a ciphertext Y such that there is no key Z that encrypts 
X to Y. Then 
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P(X|F) = 0 < P{X), 

which contradicts the definition of perfect secrecy, so at least one key Z 
encrypting X to y must exist. But since by the assumption there are exactly 
as many keys as ciphertexts, Z must be unique. It remains to prove the 
uniformity of the distribution of the keys. Denote by Z{X) the key that 
encrypts the plaintext X to the ciphertext Y. By Bayes’ rule, we have 



P{X\Y) 



P{Y\X)P{X) 

W) 



P{Z{X))P{X) 

P{Y) 



(1.3) 



By perfect secrecy, P{X\Y) = P{X), so that (1.3) implies P{Z{X)) = P{Y). 
So P(Z(X)) is the same for any plaintext X, and uniformity follows from the 
fact that any key Z has the property Z = Z{X) for some plaintext X. 

2. Now we pass to the ” if’ -part. For all X,Y there is exactly one key Z = 
Z{X,Y) that encrypts X to Y. Again by Bayes’rule (as in (1.3)) 

^ P{X)P{Z{X,Y)) 

j:^,p{x')p{z{x',y)) 

(where the sum in the denominator runs over all plaintexts X') and the 
fact that all P{Z{X,Y)) are equal, we obtain that the denominator in (1.4) 
is equal to the reciprocal value of the size of the key space and hence 
P{X\Y) = P{X). □ 

A notion related to perfect secrecy is semantic security, which will be treated 
in more detail in Chapter 10. The effect of perfect secrecy is that the adver- 
sary, even if he has unlimited computer resources, can gain no information 
about the plaintext from the ciphertext, except its length if this is not a 
known parameter (see Theorem 10.1). The disadvantage of the requirement 
of perfect secrecy is that the key must be at least as long as the plaintext. 
Roughly speaking, semantic security is a polynomially bounded variant of 
perfect secrecy, i.e. one assumes that the adversary has only polynomially 
bounded computer resources. 

A word about cascade ciphers: A cascade cipher is a sequence of component 
ciphers Ci {i = 1, 2, . . . , r), where the output of Yi of cipher Ci is used as 
input Xi+i for cipher Ci+\. In every component cipher, a key Zi is used: 



Yi = C,{X,,Zi) = C,{Y,_^,Zi) 

It is assumed that the keys Zi, Z 2 , . . . , are statistically independent (oth- 
erwise one speaks of a product cipher) . So the input X for the whole cascade 
cipher is X = Xi, whereas the output is X = Yr- Now one is tempted to 
believe that a cascade cipher is at least as hard to break as its hardest com- 
ponent. But as Massey and Maurer (1993) have shown, this is only true for 
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’’pure” known-plaintext, chosen-plaintext, and chosen-ciphertext attacks in 
which Eve can not make use of information about the statistics of the plain- 
text. As soon as the statistics of the plaintext is known, a cascade cipher 
can possibly be easier to break than its hardest component, as the following 
counterexample shows: Let Ci,C 2 be two block ciphers with input/output 
alphabet consisting of the 4 letters A,B,C,D. Assume that the keys Zi and Z 2 
are independent unbiased random bits. The component ciphers Ci transform 
the alphabet as follows (by a little free use of notation): 

Ci{{A,B,C,D),0) := (C,D,A,B), 

C^{{A,B,C,D),1) := {C,D,B,A), 

C2{{A,B,C,D),Q) := (C,D,A,B), 

C2{{A,B,C,D),1) := (D,C,A,B). 

Now we assume that for the plaintext statistics we have P{C) = P{D) = 0. 
Then Ci is completely insecure for this plaintext source, but C 2 is perfectly 
secret since the plaintext and the ciphertext are statistically independent. 
But on the other hand, the cascade cipher C 2 o Ci is completely insecure, 
since it is just the identity transformation on {A, B}\ What one can only 
prove is that a cascade cipher is at least as secure as the first component 
cipher Ci (see Massey, Maurer (1993). ’’Cascade ciphers: The importance of 
being first”). If Ci = C 2 = . . . = Cr, then of course (since the components 
commute), the iteration cipher is at least as secure as the component ciphers 
themselves. This setup will be considered in more detail in Chapter 9. 

Theorem 1.2. A cascade of n ciphers is at least as difficult to break as the 
first component. 

Proof: Consider an oracle that gives, upon request, the keys of all compo- 
nent ciphers in the cascade except the key of the first component. Breaking 
the cascade with the oracle’s help can not be more difficult than breaking it 
without this help because the oracle’s information can always be disregarded. 
However, breaking the cascade with the oracle’s help is equivalent to breaking 
the first component cipher with the oracle’s help because on the one hand 
every cryptogram of the cascade can with assumed negligible computation be 
converted into the corresponding cryptogram for the first component cipher 
and vice versa, and on the other hand the plaintexts of the first component 
cipher and the cascade are the same. However, since the information pro- 
vided by the oracle is statistically independent of the first key, it follows that 
breaking only the first component cipher with the oracle’s help is equiva- 
lent to breaking this first component without the oracle’s help. Or - in other 
words - it follows from the fact that if the cryptanalyst (Eve) attacking the 
first component cipher wishes to embed that component cipher in an artifi- 
cial cascade in which she herself chooses the second and all subsequent keys 
(independently of the first key by assumption) so as to avail herself of the 
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oracle’s aid, then she already possesses all the information that the oracle 
can provide. So breaking the first component cipher can not be more difficult 
than breaking the whole cascade cipher. □. 




2 RSA and Probabilistic Prime Number Tests 



2.1 General Considerations and the RSA System 

The RSA cryptosystem (named after R. Rivest, A. Shamir, and L. Adleman, 
who published it in the 1970s) is one of the best-known so-called public 
key cryptosystems. The idea is the following: Every participant chooses two 
different big primes p and q ”at random” and calculates their product n = pq. 
Then he chooses some arbitrary natural number e that is relatively prime to 
the Euler totient function (p(n) (which denotes the number of relative primes 
to n that are smaller than n or - in other words - the number of invertible 
elements mod.n). In our situation, we have (f(n) = (p — l)(q — 1). So for 
e one can take, e.g., any prime larger than (p — l)(q — 1) or, what makes 
the decoding and encryption in the binary system especially simple, the 4th 
Fermat Number F 4 := + 1 = 65'537 (= TOOOO'OOOO'OOOO'OOOl in the 

binary system). The pair (n, e) is the so-called public key of the participant, 
which he publishes and will be known to everybody. As his secret key, he 
keeps the solution d < p{n) of the equation 

ed = l(mo(i.(p — 1)((7 — 1)). (2.1) 

This solution can be found rapidly by the Euclidean algorithm if p and q are 
known. But factorizing numbers n seems to be computationally hard in the 
sense that there seems to exist no algorithm that is faster than exhaustive 
search. Moreover, there is no known algorithm to solve (2.1) faster than by 
finding p and q. But the actual equivalence has not been proved up to now. 
See also Boneh, Venkatesan (1998). There are similar systems (however, with 
other disadvantages) where breaking the system is provably equivalent to 
finding the secret key, for example the Rabin system (Kranakis (1986)) or 
the Williams (1980) algorithm. For convenience, we will now write (n^,e^) 
and {nB,e.B) for the public key of Alice and Bob resp., and dA and ds for 
their respective secret keys. Assume Alice wants to send a message x (w.l.o.g. 
in the form of a natural number mod. ns) to Bob. For that, she calculates 
the ciphertext y (which will also be a natural number mod. ns) by 

y ■.= {mod.n b) (2.2) 
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and sends this to Bob. Bob will make the decoding 

X = {mod-ns) (2.3) 

(which follows from (2.2) by (2.1) and Fermat’s Little Theorem). So the RSA 
system seems to ensure confidentiality. The system can also be used to ensure 
authenticity: For that, Alice sends, in addition to the encrypted message x, 
her ’’electronic signature” m, encrypted by 

u := mf'^{mod.nA), (2.4) 

to Bob. Finding dA from u is the so-called discrete logarithm problem, which 
is also believed to be hard. So by signing, Alice does not reveal her private 
key dA- Since dA is only known to Alice, she alone can have produced u, so u 
has really the role of a ’’signature”. On the other hand. Bob can verify that 
this is really Alice’s signature by checking if 

? 

= m{mod.nA)- (2-5) 

A probabilistic (or so-called Monte Carlo) primality test is an algorithm 
Ap(n) that, for the input n, gives one of the two answers ’’prime” or ’’com- 
posite” such that if it yields ’’composite”, then n is composite and if it yields 
’’prime”, then n is indeed prime with high probability. It seems to be a gen- 
eral fact in prime number testing that if in the case of the output ” prime” one 
is satisfied that this answer is correct only up to some small error probability, 
then the test runs much faster, or - in other words - what costs most effort 
is to obtain absolute security in improbable cases. At least theoretically, a 
major breakthrough has been achieved recently by Agrawal et al. (2003), 
who gave an unconditional (i.e., not depending on any unproven assumption 
as, e.g., the Extended Riemann Hypothesis (see Section 2.2) deterministic 
polynomial-time algorithm to decide whether or not a number is prime (see 
also Bornemann (2002), Bernstein (2002), and New York Times 8/8/2002). 
In detail, for a probabilistic primality test one defines a so-called primality 
sequence P = {Pn}n>i of sets of natural numbers with the following proper- 
ties: 

(i) Pn C Z* (= group of integers mod.n relatively prime to n). 

(ii) Given b G Z* one may check in time polynomial in the length of the 
binary expansion of n if 6 G P„. 

(iii) If n is prime, then P„ = 0. 

(Iv) There exists a so-called primality constant £ G]0,1[ (independent 
of n) such that for all sufficiently large composite odd n > 1 one has 
P{x G Z’!^ : X ^ Pn) < s. 

Now the test algorithm works as follows: 

- Input: n > 2. 

- Choose an integer x G Z* at random. 

- Output: Ap{n) =” prime” if a; ^ P„ and Ap{n) =” composite” if x G P„. 
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If the test is run sufficiently many times (with independent values for x), 
then the error probability can be made arbitrarily small: 

P{Ap{n) = ’’prime”, although n is composite) < 



2.2 The Solovay-Strassen Test 

This test uses a well-known object from number theory, the so-called Leg- 
endre- Jacobi symbol (x|n). If p is a prime and x G Z*, then the Legendre 
symbol is defined as {x\p) = 1 if x is a quadratic residue modulo p and 
{x\p) = —1 else. By Euler’s criterion (see, e.g., Kranakis (1986), Theorem 
1.11), for all odd primes p one can calculate the Legendre symbol explicitly 
as 

{x\p) = x^^~^^^‘^{mod.p). 

Now, for general n and x G ^*, one defines the Legendre- Jacobi symbol by 

t 

(x\n) = 

i=l 

if n = rifci Pi denotes the prime factorization of n. 

Now the primality sequence of the Solovay-Strassen test is defined based on 
Euler’s criterion: 



Pn = {x & yf (x|n)(mod.n)}. 

From Euler’s criterion, conditions (i)-(iii) for primality sequences are fulfilled. 
It remains to prove (iv). For this, we need some preparation. 

Denote by I'mit) the largest k such that m^\t. 

Lemma 2.1. Let n = rii=iT^* ^^6 prime factorization of the odd integer 
n (i.e., the pi are the different prime factors of n) and m S IN. Put v := 
vnm.{v 2 {Pi — 1) : t = 1, 2, . . . , t} and s := rii=i gcd(m, p(p^*)). Then 

(1) The equation x™ = l{mod.n) has exactly s solutions. 

(2) There exists some x with x™ = —l{mod.n) iffv 2 {rn) < min{:^ 2 (Pi — 1) : 
z= 1,2,...,0 

(3) If the equation x™ = —\{mod.n) has a solution, then it has exactly s 
solutions. 

Proof: Let (for a prime p and a generator g of Z*) the shorthand indexp^g(x) 
denote the unique m < p — 2 such that x = g"^{mod.p). For each i G 
{l,2,...,t}, let gi be a generator of Taking indexes on both sides 

Pi' 

of the equation x™ = a{mod.n) one gets 

m ■ indexp.^g.(x) = indexp.^g.(a)(mod.ip(p*‘)). 
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Substituting a = 1 yields 

TO • indexp._g.(x) = 0{mod.ip{p'l*)), (2.6) 

whereas for a = —1 we get indexp^^g^(— 1) = ip{p^')j2 and thus 

TO • indexp. . (x) = p{p'i')/2{mod.p^'). (2.7) 

Now (1) of Lemma 2.1 follows from (2.6) and the theorem on the solution of 
linear congruences. The same theorem also implies that (2.7) has a solution 
iff 

gcd(TO,(/?(p,^•))|V3(p^)/2 

for all i = 1, 2, . . . , t. But the latter holds exactly iff V 2 {m) < u\m{v 2 {pi — 1) : 
i=l,2,...,t}.U 

The next is a lemma due to Monier: 

Lemma 2.2. Let n he odd and assume pi,P 2 , ■ ■ ■ ,Pt are the distinct prime 
factors of n. Then one can write 

\K\Pn\=5ni{gcdC^,p,-l) 

i=l 

where 6„ assumes one of the values 1/2, 1,2. 

Proof: Define the multiplicative group endomorphisms fn,9m h„ of 

fn(x) := x^"‘~'^P'^{mod.n), 

gn{x) '■= (x\n){mod.n), 
hn{x) := {x\n) ■ x^"'~^P'^{mod.n). 

Let Kn, Ln, M„ be the kernels of fn, 9 n,hn, resp., and denote 

:= {x € Z*: /„(x) = -l(TOod.n)}, 

L'„ := {x G Z*: g„(x) = -l(TOod.n)}, 

M/ := {x € : hn{x) = —l{mod.n)}. 

Clearly M„ = 2Zf\Pn- By Lemma 2.1 it follows that 

\Kn\ = Pi - 1). 

However, M„ = {K„ f| L„) \J{Kf f| L'„). Thus 

|M„| = I 



|K„nL„| : K/nL/ = 0 
2|iL„nL„| : K/nL'„^0 
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(if K'^nL'^ 0, then choose xq G and consider the bijection x xxq 

to prove \KnC\Ln\ = \K!^r\L'j^\). A similar argument using the decomposition 
Kn = {Kn n Ln) U (AT„ H Ajj) can be used to show 



\Kn n L„| 



|X„| : X„nL'„ = 0 

(1/2)|K„| : X„nL'„^0. 



The assertion follows. □ 



Theorem 2.1. For all composite odd integers n we have 

\^n\Pn\ ^ I 
if{n) ~ 2 

Proof: Let again Yll=iPi' be the prime factorization of n (i.e., pi,P 2 , ■ ■ ■ ,Pt 
the distinct prime factors of n). By Lemma 2.2 it follows that 



\^*\Pn\ ^ , T^ gcd(^,p.-1) 

p1^-\p^-i) 



( 2 . 8 ) 



If for some i it holds that ki > 2, then the right-hand side of (2.8) is 
bounded from above by <5„/3 < 2/3. So Z^\Pn must be a proper sub- 
group of Z* and hence |.^*\P„| < {l/2)(p{n). Thus w.l.o.g. we may assume 
that all ki = 1. Assume Z* = Mn- Since n is composite, it follows that 
t > 2. Assume 5 is a generator of By the Chinese remainder theo- 
rem there exists an x G with x = g{mod.pi) and x = l{mod.{n/pi)). 
By the assumption = M„ it follows that = (x|n)(mod.n). But 

{x\n) = rii=i( 2 ^bi) = (f/bi) = “1- bo = — l(mod.(n/pi)), which is 

a contradiction to x = l{mod.{n / pi)) 

We mention that the Solovay-Strassen test is deterministic if the so-called 
Extended Riemann Hypothesis (see, e.g., Kranakis (1986), 2.10), a famous 
conjecture in analytic number theory, is true. This conjecture asserts the fol- 
lowing: Let X be a so-called character modulo n, i.e., a group homomorphism 
X : extended to N by x(^) ■= 0 if gcd(a;,n) yf 1. Then the 

Dirichlet L-series with respect to the character x is defined as 



Px(^) 



00 



E 



x(fc) 

k^ ’ 



which is convergent for all complex z with real part greater than 1 and can 
be meromorphically extended to an analytic function for all complex z with 
positive real part. Now the Extended Riemann Hypothesis is the conjecture 
that all zeroes of with real part in ]0, 1] have in fact real part 1/2. Up 
to now, the Extended Riemann Hypothesis has not yet been proved, but 
there is overwhelming evidence (both by theoretical arguments and numerical 
calculations) that it really holds (see e.g. Odlyzko (2001)). 
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2.3 Rabin’s Test 

Here, the primality sequence is defined as 

Pn = {x & ^ l(mod.n) andcc^"“^^/^ ^ —l{mod.n) 

for all 0 < /i < e}, 

where e := V 2 {n — 1) (as defined before). Also here, properties (i)-(iii) of 
primality sequences are easily verified. We must again prove (iv). We have 

Z*\Pn = {x G Z* : = l(mod.n) orx^"“^^/^ 

= —l{mod.n) for some 0 < h < e}. (2.9) 

Again, a lemma due to Monier determines the exact size of this set: 

Lemma 2.3. Assume n is a composite odd integer with prime factorization 
n = rii=i Pi' (Pi distinct primes). If we write n — 1 = 2^u (u odd), pi — 1 = 
2^*Ui (ui odd), and /X := min{/ii, /X 2 , ..., /tt}, then 

qtn 1 * 

\^u\Pn\ = (1+ y _ )J]^gcd(n,ni). 

i=l 

Proof: Put s := Hti gcd(n, Ui). By Lemma 2.1, the first congruence in (2.9) 
has exactly s solutions. For any given h, the other congruence in (2.9) has a 
solution (and thus s solutions) iff t' 2 ((n — l)/2^) = e — h < p,. So, for each 
h > e — p, the number of solutions of the equation 

2;("-i)/2 _ —K^uiod.n) 

is given by 

^ Pi 

Hence 

e t . 

\^n\Pn\ = S+ 

h—e— i—1 

Now the assertion follows from the fact that 

- 1) = 2"="'‘gcd(n, -«*).□ 

Let e and n be as in Lemma 2.3 and define the set 

Rn := {x G yf l{mod.n) or 1 < gcd(x^"“^^^ — ^,n) < n 

for some 0 < h < e}. 
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The following lemma is due to Miller, Rabin, and Monier: 

Lemma 2.4. For all odd integers n > 2, we have P„ = Rn- 

Proof: Take an arbitrary x G and consider, for each h such that 2^|n— 1, 
the expressions 



d{h) := 
b{h): 



n — 1 



and 

g{h) := gcd(&(/i) - l,n). 
Then, if 2^\n — 1, we have the properties: 



g{h) = n<^ 


>■ b{h) = l{mod.n), 


(2.10) 


g{h) =n = 


g{h- 1) = n, 


(2.11) 


b{h- 


1) = 6(6)2. 


(2.12) 



1. We first prove that C i?„. Assume the contrary and let x G Pn\Rn- 
It follows that there must be an integer k < e with g{k) = n. As x G Pn, 
we have yf l(mod.n), so g{e) yf n. Hence, there exists a fc < e with the 
property 



5(0) = 5(1) = . . . = g{k) = n> g{k + l) = g{k + 2) = . . . = g(e) = 1. 

Hence b{k+ 1)^ = l{mod.n) and thus n\{b{k+ 1) — l){b{k+ 1) + 1). Together 
with the fact that g{k+l) = gcd(&(fc+l) — l,n) = 1 this yields that b{k+l) = 
— 1 (mod.n), which contradicts the assumption x G Pn- 
2. Now let us show the relation C Pn- Assume, on the contrary, that 
there exists some x G Rn\Pn- Then either 6(e) = l{mod.n) or there is some 
h G {1, 2, . . . , e} with 6(6) = —l{mod.n)- In the first case x ^ Rn, so we may 
assume that 6(e) yf l(mod.n). We may choose some k < e such that 

6(0) = 6(1) = . . . = 6(6 — 1) = l{mod.n), 



but 



From the fact that 6(6) 



6(6) = —l{mod-n). 

6(6 + j)^^ = —l{mod-n), we get 



6(6) — 1 = 6(6 + 1)^ — 1 = 6(6 + 2)"* — 1 = . . . = 6(e)^ — 1 = —2{mod.n)- 



But for all j < e — 6 there is an integer Sj with the property 
6(6 + jY — 1 = {b{k + j) — l)sj = —2{mod.n)- 
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As n is odd and greater than 2 by assumption, we obtain that g{k + j) = 1 
for all j < e — k. Since, on the other hand, g{j) = n for all j < k, we deduce 
that g{h) G {l,n} for all h. So indeed x ^ i?„. □ 

Now we are ready to calculate the primality constant, i.e., to verify property 
(iv) of primality sequences. Denote qi := p^' (as before, pi denote the distinct 
prime factors of n and k{ the maximal power in which they occur in the prime 
factorization of n (i = 1,2, ... ,t)). Furthermore, put hi := gcd{(p{qi),n — 1), 
rrii := ip{qi)jhi, Ci := V 2 {hi), and a* := max{ci - ■. j = 1,2, ... ,t}. One 

observes that if e* = min{ei, C 2 , . . . , e*}, then = 0. Define 

I ■.= {1 < i < t ■. ai > 0}, 

J := {1 < i < t ■. ai = 0}, 

and a := X)i=i fi '.= | J|- We have j3 > 0 and a + (3 > t. The following 
theorem gives a general expression for the primality constant of the Rabin 
test: 

Theorem 2.2. If n > 2 is a composite odd integer with prime factorization 
n = (Pi ^^6 distinct prime factors), then 

\^n\Rn\ ^ 1 

Proof: Assume x G ^*\Rn. So = l{mod.n). For i = 1,2,. . .,t, denote 
by gi a generator of 2Z*.. It follows that there are Si < p{qi) with x = 

gl'{mod.qi). Hence = g)'^^ = l{mod.qi) and (p{qi)\si{n — 1). As 

gcd(mi, n— 1 ) = 1 and mi\si{n— 1 ), we obtain the existence of £i < <p{qi)/mi 
such that Si = mtii. Hence 

X = imod.qi) (2-13) 

and Si{n — 1) = mi£i{n — 1) = It can be proved that 

2“^|£, (i= I,2,...,t). (2.14) 



[W.l.o.g. we may assume at > 0. Choose j such that = e* — Cj > 0 and 
define fi := t' 2 (n— I) — e* > 0. Thus for 7 ^ := Ci — Cj + fi we get V 2 {d{')i)) = Cj. 
Furthermore, hj\d{'^i). Thus p{qj) = hjmj\mjd{^i). From (2.13) we get 

^d(7d = ^ i^rnod.q,). 

Hence 1 < qj < gcd(x'^(^*^ — l,^^) = n (since x ^ i?„) and thus = 

l{mod.n). Together with (2.13) this implies hi\d{"fi)£i. Assertion (2.14) fol- 
lows.] Now (2.13) and (2.14) yield 



\K\Rn\ < n 






< 



p{n) 



2“*mi 2“ni=i- 
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So it suffices to prove that (3=1. Assume, on the contrary, that (3 > 2. All 
Ci with i G J have the same common value e*, say. Put 7' := fj + 1, which 
also has the same value 7, say, for all j G J. So hj/2\d{j), but hj is not a 
divisor of d{j). However, due to (2.13), for all j G J we have 

2 ;<t(7) _ \(jnod.qj) 4=^ Lp{qj)\(.jmjd{'^) 4=^ hj\ijd{"f). 

On the other hand, gcd(x‘^^''"^ — l,n) G {l,n} since x G Z(\Rn- So 2\^j is 
either true for all j G J or false for all j G J. Now the assertion follows from 
the fact that 2“*|£j for all i G /. □ 

The following corollary gives a still more explicit estimate of the primality 
constant for all relevant cases: 

Corollary 2.1. For all odd composite integers n > 11, it holds that 

\^n\Rn\ ^ 1 
ip(n) ~ 4 

Proof: In case t > 3, the corollary follows directly from Theorem 2.2. The 
same is the case if t = 2 and either mi = 2 or m 2 = 2 (since r = 2 implies 
a + P — 1 > 1). We consider first the case t = 2, mi = m 2 = 1. So we may 
write n = P 1 P 2 , w.l.o.g. with pi < p 2 - But then 

P2-l = ‘f{P2)\n - 1 =Pi{p2 - 1) + {Pi - 1), 

which is not possible. So it remains the case t = 1. If we write n = p^ for 
some k >2, we get 

\^n\Rn\ < \{x G = l{mod.n)}\ < gcd(n - l,p - 1) = p - 1 

and thus 

^ P -1 1 

7j(n) “ p^~^{p— 1) p^~^ ~ 4 

since p > 11. □ 

For integers n with many different prime factors, we have even a better esti- 
mate of the primality constant (see Kranakis (1986), Theorem 2.34): 

Corollary 2.2. For odd integers n > 2 whose number of distinct prime fac- 
tors is t, we have 

\^*n\Rn\ ^ 1 

~ 2*“i 

For further quantitative results in this context see Darmgard et al. (1993). 

2.4 *Bit Security of RSA 

Denote by Lsb(x) the least significant bit of the natural number x (repre- 
sented in its binary expansion). By a little abuse of notation, we will also 
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write Lsb(x) for x := x{mod.n) represented as an element of {0, 1, . . . , n— 1}. 
As before, let p, q be distinct odd primes, n := pq. Assume the RSA exponent 
e is relatively prime to pin). The following theorem says that if a polynomial- 
time algorithm for calculating the least significant bit of the plaintext x ex- 
ists, then a polynomial-time algorithm for calculating the whole of x also 
exists. Similar considerations can also be made e.g., for the Rabin system, 
see Kranakis (1986), 5.7 and also Delfs, Knebl (2002), 7. 

Theorem 2.3. If there exists a polynomial-time algorithm 
Ai = Ai(n, e, y) = Lsb(cc) (x € 
then there is also a polynomial-time algorithm 

A 2 = A 2 {n,e,y) = x {x & Zff). 

Proof: The method of proof is rational approximation, i.e. to calculate a G 
and M G [0, l[niQ such that 

I— I 1 

\ax — un\ < 

' ' 2 

The algorithm proceeds recursively. Let uq := 0 and oq := 1 be the starting 
values. Let 2“^ denote the inverse of 2{mod.n). Define recursively 

at '■= 2 



and ^ 

ut ■■= -fLsb(at_ix)). 

With this definition, we obtain 



atx = 2 ^at-ix 



^at-ix 

\iat-ix + n) 



at-ix = Oimod.2) 
at-ix = l(mod.2). 



(2.15) 



By the fact that 

Lsb(atx) = Ai(n, e, a^y) 

it is possible to decide whether alx is even from the data available to Eve 
(i.e., without knowing x explicitly) and thus the recursion step (2.15) can 
really be done by Eve. So 



|aox — Uf)n\ < n, 



\atx - utn\ = -\\at-ix - Ut-in\, 

and hence after |n| -I- 1 steps (where |n| means the length of the binary 
expansion of n) she will have 
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Tl i 

|o|n|+ia; — M|„|+in| < 2|„|+i < 2" 

But from (2.16) it follows that 

^|n|+l^ L^|n|+1^ “t” 2 J 



(2.16) 



and hence ^ 

a; = a'^J\+i[u\n\+in + -J (mod.n).D 

An analogue of Theorem 2.3 also exists for probabilistic algorithms. 

Definition 2.1. A probabilitstic algorithm is an algorithm A that, during the 
computation of the output y from the input x, is allowed to generate a finite 
number of independent unbiased random bits, and the next step may depend 
on the results of the preceding random bits. The number of random bits may 
depend on the outcome of the previous ones, but is bounded by some constant 
tx for a given input x. 

A probabilistic algorithm is called polynomial-time (or polynomial) if the run- 
ning time of A(x) is bounded by some polynomial ^(|2|) that is independent 
of z. (Generating a random bit counts as one step in the complexity of the 
algorithm.) 

A polynomial f{z) is called positive, if ^(z) >0 for all z > 0. The following 
theorem is the probabilistic analogue of Theorem 2.3: 

Theorem 2.4. Let p, q be distinct odd primes and write n := pq for their 
product. Assume e is relatively prime to p{n) and denote y := x^fmod.n). 
Let f and y be positive polynomials with integer coeffficients. Suppose there 
exists a probabilistic polynomial time algorithm A\ such that, for uniformly 
distributed x on Z*, it holds that 

P{Ai{n, e, y) = Lsb(x)) > i 

Then there exists a polynomial-time algorithm A 2 such that 
P(A2(n,e,j/) = x)>l-2-"(l”l). 

The proof of Theorem 2.4 rests on the following lemmas. The first one is just 
a consequence of a quantitative version of the Weak Law of Large Numbers: 

Lemma 2.5. Assume Si, S 2 , . . . , S„ are pairwise independent binary random 
variables with common expectations E{Si) =: a = \ -\- e (e > Q). Then 

i=l 



1 
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Proof: Observe that 



implies 



1 . * 

- a\ < e 



1 ‘ 



1 

2 ’ 



and therefore (with the aid of Cebysev’s inequality and some straightforward 
calculations) we obtain 






a\ < e) 



Lemma 2.6. Under the hypotheses of Theorem 2.4, there exists a probabilis- 
tic polynomial-time algorithm L with the following properties: If a, b are in- 
dependent randomly chosen elements of Z* (according to the uniform distri- 
bution on this set), if we take u,v gQ such that 



ax — un\ < — — 
8 



and 



\bx — vn\ < — 

8 

(for some £ > 0 small enough), and if we put (recursively) oq := 
at := then L successively computes values it (for t = 0,1, . 

such that 



a and 
,.,\n\) 



P{it = Lsb(a(x) I ij = Lsb(ojx)(0 < j < t — 1)) > 1 



1 

WA' 



(2.17) 



(In fact, we choose a,b G ^n- But otherwise, then we may factor n just by 
the Euclidean algorithm.) 

Proof of Lemma 2.6: Put m := min{2*/£r^, 2|n|/e^}. Then w.l.o.g. we may 
assume that p,q > m because otherwise, we can factorize n in polynomial 
time just by exhaustive search. 

Put first 

a := Lsb(ox), 
l3 := Lsb(&cc). 

We now show first how to calculate it = Lsb((ot + iat-i + b)x) (w.l.o.g. we 
may assume that at + iat~i + & is really invertible mod.n, for otherwise we 
can factor n just with the Euclidean algorithm). The following subroutine 
(which calculates it, at, and Ut recursively (the resulting algorithm will be 
called L')) is run: The initial value is io ■= a, at-i := oq := a, and Ut-i := u: 
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- Co ^ 0; Cl ^ 0; 

- at ^ 2“^at_i; Ut ^ + a); 

- FOR i = —m/2 to m/2 — 1 DO 
“ A < — Ot + iat—i + b', 

- W ^ [ut + iut-i + v \ ; 

~ B ^ {ia + f3+ W)(mod.2); 

- IF Ai{n, e, A^y{mod.n)) + B = Q 

• THEN Co ^ Co + 1; 

• ELSE Cl ^ Cl + 1, 

~ lit— 1 ^ tit', Clt—1 ^ 

- IF Co > Cl 

- a ^ 0; 

- a ^ I; 

- RETURN a; 

So we have got the ’’modified value” a := £t = Lsb((ot + iat-i + b)x). 

Now we will calculate what we really want, namely Lsb(otx). We will see that 
the hypotheses of Lemma 2.6 guarantee that we can indeed infer Lsb(otx) 
with high probability. For i = —ml2, —m/2 + 1, . . . , m/2 — 1 define 



At,i at + iat-i + b, 

:= Ut + iut-i + V, 

Wt,^ ■■= [wij, 

Bt,i '■= {i ■ Lsb(at-ix) + Lsb(6x) + Lsb(IFt_i))(mo(i.2). 
We want to compute Lsb(otx) (recursively) from the data 

Lsb(Ht^ix), Lsb(at_ix), Lsb(&x). 



Put 



Aty := atx + i • at-ix + bx 
= wn + At-iX 



where 



Then 



w ■■= [Aty/nJ. 



Lsb(At^i) = (Lsb(atx) + i ■ Lsb(a(_ix) + Lsb(&x))(mod.2) 



and 



Lsb(H(^ix) = (Lsb(Aty) + Lsb('u;))(mo(i.2) 

= (Lsb(otx) + i ■ Lsb(at_ia;) + Lsb(&a;) + Lsb(tu)) (mod. 2), 
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and we obtain 

Lsb(otx) = (Lsb(At^ix) + i ■ Lsb(at_ia;) + Lsb(&x) + Lsb(tu)) (mod. 2). 

Now let us determine w and its least significant bit Lsb('u;). The method 
will be to show that w equals Wt,i with high probability and that, on the 
other hand, it is really possible to compute Wt^i in polynomial time from 
the available data Ut (the rational approximation of atx), Ut-i (the rational 
approximation of at-ix), and v (the rational approximation of bx). If indeed 
Wt^i = w, we have 



Lsb(otx) = {\jsh{At^ix) + Bt^i){mod.2). 

Now assume that the algorithm L' has computed the least significant bit 
correctly in all preceeding steps, i.e., 

Lsb(aTx) = (.j (0 < j < f — 1). 



We intend to give a lower bound for the probability that Wt^i = w. Denote 
the random variable 

Z := |Am - 

We may estimate 



Z = \atx — Utn + i{at-\x — Ut-in) + bx — ' 



< |-(at_ia; + ■Ut_in)(l + 2i)\ + \bx — vn\ 

n , £ 

£ ,£^m 

<y.(— + 1) 

£ 

< —n. 

4 



(2.18) 



Under our assumption that £j = Lsb(ojx) {j = 0, 1, ... t — 1) if follows (as in 
the proof of Theorem 2.3) that 



\ajX — UjTi\ = -\(aj-ix — Uj-in)\ (1 < i < t). 



Furthermore |1 + 2i| < m (since —m/2 < i < m/2 — 1). Now we observe that 
Wt,i yf ru iff there is a multiple of n between Aty and W/ ^n. The latter is not 
the case, if the following holds: 



-n < Aty = At,iX <n - -n. 
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Hence by the uniform distribution of a and b on it follows that the 



^t,i — (2 ^ + i)at-i + bx 
are also uniformly distributed and thus 

P{Wi^t = w)> P{^n < At,iX <n - |n) 



Now we want to show 

P{£t = Lsb(o(x) I £j = Lsb(ojx)(0 < j < t — 1)) > 1 — (2-19) 

2\n\ 

Consider the events 



Ei,i ■= {Ai{n,e,Al-y) = Lsb(Ht,ia;)} 



£ £ 

p2,i ■■= {-n< At^iX <n - -n}. 

It follows that P{Ei^i) >\ + e and P(i? 2 ,i) = 1 — e/2. Consider the indicator 
random variables li := H The algorithm L computes Lsb(atx) 

correctly in the i-th step if both events Ei^i and i? 2 ,i occur. So it follows that 

P{I^ = 1) > P{E2,i) - {I - P{El^^)) 

1 £ 

“ 2 2 ' 

Now assume i yf j. Take the probabilities P{Ii = d) and P{Ij = d) {d G 
{0, 1}) over all random choices of a, & G and the random bit generations 
produced by the algorithms Ai(ri,e,A^^y) and Ai{n,e, A^ ^y). If we define 
the 2 X 2-matrix 




which is invertible over and has determinant i — j G (since \i — j\ < 
m < minjp, g}), then we have 

(At,i,Atj) = {at-i,b)A= (2“*+^a, &)Z\. 

This implies that for i yf j, the random vectors and Atj are independent. 
So the events if 2 y and E 2 J and the random variables Af^y and A^ ^y (i yf j) 
are independent. Hence (for i yf j) the events i?iy and Eij and thus the 
indicator variables li and Ij are independent. By Lemma 2.5, it follows that 
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m/2-l 

p{ V /^ > ^) > 1 - 

^ , 2 ne 

iz=i — 'Ynl2 



— — 



If Lsb(otx) = 0, then we have Cq > Si and thus 

P(Co > Cl) > 1 - 

2|n| 

On the other hand, if Lsb(otx) = 1, then by analogy 



C(Ci > Co) > 1 



1 

WA' 



The assertion of the lemma follows. □ 

Now we are ready to prove Theorem 2.4: 

Proof of Theorem 2.4: We run the following algorithm: 

- Choose a, 6 e at random. 

- Guess tt, V G [0, 1] n such that 



\ax — un\ < — n 

I I - g 

and 

. £ 

\ox — vn\ < —n. 

I I - g 

- Guess a := Lsb(ra),/3 := Lsb(6x). 

- Compute fo) ^ 1 , • ■ • ) ^|n| by the algorithm L from Lemma 2.6. 

- FOR t = 0, 1, . . . , |n| DO u ^ ^{u + £t),a ^ 2~^a. 

RETURN a-^[un+ ij . 

It is easy to see that this algorithm is indeed polynomial, since there are only 
polynomially many alternatives for all guesses, and both the calculation of 
each alternative and checking the result can be done in polynomial time. For 
guessing u,v, one has to consider polynomially many (namely 8/e^ and 8/e) 
intervals and there are only 2 possible values for a and p. This algorithm 
(called A) has success probabilty 



P(T(n, e, y) = x) > (1 - ;^)”- 

2|n| 



If we repeat A sufficiently many times (with independent inputs and ev- 
ery time using the trivial deterministic algorithm for testing the result), the 
assertion of Theorem 2.4 follows. [The probabilty of a wrong answer in t 
repetitions is bounded by 
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(- — < (1 — 

_ t-l _ |'_?_'|4(l"l)/2'|2t/4(|n|) 

< g-2t/C(|n|) 

< g- log2-r;(|n|) 

_ 2-'/(l"l) 

for large enough f .]□ 

2.5 The Timing Attack on RSA 

The fact that factoring is probably computationally difficult should not lead 
us to believe that there are no attacks possible on RSA. See, e.g., Boneh 
(1999). Here, we will present a type of attack based rather on the imple- 
mentation of RSA than on the algorithm itself. It may be possible for Eve 
to measure the time a smartcard uses for performing RSA-operations. With 
this, she may be able to recover the private key (1a of Alice. We first show the 
repeated squaring algorithm for computing y = x‘^^(mod.nA), which runs in 
time linear in log cIa ■ Let 

(1a — dmdm—l ... do 

be the binary expansion of dA- Observe that 

m 

y {mod.nA) ■ 

i=0 

The repeated squaring algorithm works as follows: 

- Put the initial values X ^ x and Y ^ 1. 

- For i = 0,1,..., n put Y ^ Y ■ X{mod.nA) (if di = 1) and X ^ 
X‘^{mod.nA)- 

Then at the end we have Y = y. 

The timing attack can now be mounted by Eve as follows: She takes a large 
number k of random plaintexts xi,X 2 , ■ ■ ■ ,Xk G ^nA measures the time 
Ti the smartcard uses for encrypting Xi. Now she may recover the bits di 
of dA in the following way: Of course dA is odd, so do = 1- In the second 
iteration, the smartcard computes Y ■ X = X ■ X'^{mod.nA) iff di = 1. Let 
U denote the time the smartcard uses for computing Xi ■ x1{mod.nA)- Eve 
can have measured these ti offline before mounting the attack and compares 
them now with the Ti. Namely, it turns out (Kocher) that if d\ = 1, the se- 
quences {Ti}i<i<fc and {ti}i<i<fc are (positively) correlated, whereas in the 
other case they behave as independent random variables. So by measuring 
the correlation of {Ti}i<j<fc and Eve can guess di, etc. 
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Of course, this attack can be avoided by adding artificial delays on the smart- 
card so that all modular exponentiations take the same time and can there- 
fore not be distinguished by measuring time. Another possibility is so-called 
blinding, suggested by Rivest: Before encrypting the plaintext x, one picks a 
random r G ^ua replaces x by 

x' := xr^^{mod.nA)- 
Now the RSA-encryption of x' yields 

y' = {x'Y^{mod.nA) 
and the output of the smartcard is 

y = y' /r {mod. n a)- 

Here, the RSA-exponentiation with dA has been applied to x' that behaves 
randomly, so the timing attack before is not possible. 

If the exponentiation with the secret exponent is done by Montgomery’s 
multiplication algorithm for the prime factors of the secret exponent and the 
Chinese remainder theorem to obtain the final result, then one can not use 
the above-mentioned attack, but rather the one described in Schindler (2000). 
Other related attacks are the measuring of power consumption of the smart- 
card. As a consequence one sees that despite the theoretical strength of the 
RSA system, its implementation in hardware must be done with much care. 
(This is, of course, also true for other cryptosystems.) For more information 
on timing attacks, see e.g. Schindler (2002a). Here, interesting methods from 
statistical desicion theory (which are beyond the scope of this text) come 
into play. A combined timing and power attack on RSA was presented in 
Schindler (2002b) and Schindler, Walter (2003). An interesting observation 
in this direction for elliptic-curve cryptosystems was made by Okeya, Sakurai 
( 2000 ). 

2.6 *Zero-Knowledge Proof for the RSA Secret Key 

Up to now, we have always assumed that the parties Alice and Bob trust 
each other and that they only want to prevent Eve from eavesdropping. In 
this short section, we take another view: Alice wants to convince Bob that 
she knows some secret but she does not want to give Bob any information 
about the secret itself. For example, she wants to tell Bob that she knows, 
say, Carol’s private RSA key, but she does not want to give him any hint 
as to the key itself or even decrypt one of Carol’s messages to Bob. Let 
e= ec resp. d = dc he Carol’s private resp. public key. The zero-knowledge 
protocol will involve an interactive fair so-called coin- flipping subprotocol. 
There are different ways to do this. We present the method with the square 
roots. Other methods use exponention modulo a prime or Blum integers (see 
Schneier (1996), pp. 542f.) 
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1. Alice chooses two large primes p,q and sends their product n := pq to 
Bob. 

2. Bob chooses a random natural number r < nj2 and sends 

z := r^{mod.n) 

to Alice. 

3. Since she knows p and q, Alice can compute the 4 square roots x, —x,y, —y 
(say) of z{mod.n). Denote (with slight abuse of notation) 

x' := \nva{x{mod.n) , —x{mod.n)} 

and 

y' := \mn{y{mod.n) , —y{mod.n)}. 

Then we have r G {x' ,y’'\. 

4. Alice guesses if r = x' or r = y' and transmits her guess to Bob. 

5. If Alice’s guess was correct, then the coin- flipping subalgorithm outputs 
1, otherwise 0. 

6. Verification subsubprotocol: Alice sends p and q to Bob, Bob computes 
x' and y' and sends them to Alice, then Alice calculates r. 

Since Alice can not know r, her guess is really random. In step 4, she tells 
Bob only one bit of her guess in order to prevent him from obtaining both x' 
and y'. If Bob has both of these two numbers, he can change r after step 4. 
Now the zero-knowledge protocol proceeds as follows: 

- Alice and Bob agree on random k, m such that 

km = e{mod.n) 

(with k,m>5, otherwise they restart the algorithm) using a coin-flipping 
protocol. 

- Again by a coin-flipping protocol, Alice and Bob generate a random ci- 
phertext y. 

- Alice uses Carol’s private key to compute 

X = y‘^{mod.n) 



and 

t := x^{mod.n) 

and sends t to Bob. 

- Bob checks if t’” = y{mod.n). If yes, he believes Alice. 

This protocol can be rerun several times. Then the probability that Alice 
bluffs decreases exponentially with the number of times the algorithm is 
executed. 




3 Factorization with Quantum Computers: 
Shor’s Algorithm 



3.1 Classical Factorization Algorithms 

The most famous classical factorization algorithms are the Quadratic Sieve 
(QS) and the Number Field Sieve (NFS). Though being subexponential, they 
are not polynomial. The QS is the fastest general-purpose factorization algo- 
rithm for numbers with less than 110 digits, whereas the NFS has the same 
property for numbers with more than 110 digits (see Schneier (1996), p.256). 
Recently, RSA-576 with a number of 174 decimal digits was factorized by 
Franke from the Universoty of Bonn with the aid of the NFS. The NFS was 
also used to factorize the Mersenne number 2"^^^ — 1 (with 288 decimal digits) 
by the Internet project NESNET (about 5 months of computing time on up 
to 120 machines was necessary). Further limits on the factorization of large 
numbers can be found on the Internet site CiteSeer. For particular types of 
numbers to be factorized, many specially designed algorithms have been de- 
veloped, which in these cases are faster than the above-mentioned ones. A 
new direction of cryptanalysis would be the possible use of quantum com- 
puters istead of classical Turing machines. Up to now, quantum computing 
has been more or less only a theoretical concept based on the superposition 
principle of quantum mechanics. Beyond some basic experiments, nobody 
has really an idea how to realize physical quantum computers working effi- 
ciently. However, if one day quantum computers could be built, this would 
have dramatic consequences for cryptology. Namely, in the second half of the 
1990s, Peter Shor showed that on a quantum computer, large numbers can 
be factorized in linear (with respect to the length of the binary expansion of 
the number) time! So in this case, the RSA and all related systems would be 
worthless against an adversary who has a quantum computer at his disposal. 
Shor’s method is in fact a hybrid algorithm in the sense that it consists of 
four components, one being done by quantum computing and three others (a 
little trick based on the Euclidean algorithm from elementary number theory, 
Fourier transform and continued-fraction approximation) that can be done 
on a classical computer. (Note that the Fourier transform component can, 
but need not be done on a quantum computer.) This algorithm will be ex- 
plained in Section 3.4. We note that Shor has developed another algorithm 
for solving the discrete logarithm problem on quantum computers. Here, we 
will not discuss that, but the principles are similar. 
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Note that in contrast to Chapter 13, where we will present the ideas of quan- 
tum cryptography, here we use quantum computers to cryptanalyze classical 
cryptosystems. 

We remark that there is a new approach due to Hungerbiihler, Struwe (2003), 
who suggest the heat flow as a cryptographic system that resists also attacks 
by quantum computers. It is based on the second principle of thermodynam- 
ics (increase of entropy). The evolution problem for the heat equation is a 
well-posed initial-value problem, which can be solved very precisely by nu- 
merical methods, whereas the evolution problem in backward time is ill-posed 
and numerical methods for solving the heat equation for negative time are 
inherently unstable. 



3.2 Quantum Computing 



Let us now give a short introduction to quantum computing, which rests on 
a non-Kolmogorovian type of probability, namely quantum stochastics. In 
the following, we will present some basic facts on quantum mechanics and 
quantum computing. In quantum physics, the state of a quantum system is 
described by a vector in a (complex) Hilbert space H. It is customary to 
write such a state vector as a column vector, or - in the jargon of quantum 
physics - as vector” If/;). The corresponding line vector is written as 
{^p\ and called ”&ra vector”. The squared norm of the vector, or - in other 
words - the scalar product of the vector with itself is then written as 
which becomes a bracket. We now come to the process of measurement in 
quantum mechanics. As a principle, in quantum mechanics measurements of 
observables are described by Hermitian operators A acting on the underlying 
Hilbert space H. If the system is in an eigenstate of A, then the measure- 
ment with the operator A just reproduces the state, multiplied with a real 
number (since A is Hermitean). If the system is not in an eigenstate, then 
the outcome of the measurement will collapse to one of the observables (cor- 
responding to eigenstates (eigenvalues)) of A, but what is important is that 
it cannot be predicted in advance to which one. Only probabilites can be 
indicated, which correspond to the principle of superposition. The result of 
any measurement of a quantum system described by the state vector \ip) is 
always one of the eigenvalues of the operator A, corresponding to the observ- 
able being measured. If the system is in an eigenstate of A, then the outcome 
of the measurement is just the corresponding eigenvalue of this eigenstate. In 
general, the system will be in some general state 4>. Then we may represent </> 
as a complex linear combination with respect to a basis {tpi}i of eigenstates 
of A: 

\4') = (3.1) 



where the LOi are called the probability amplitudes. If w.l.o.g. we assume that 
|wip = I, then jwjp is interpreted as the probability that the system is in 
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eigenstate i with respect to A. So a quantum system can exist in a blend of 
all its eigenstates with respect to a certain Hermitian measurement operator 
simultaneously. This is called the principle of superposition and is the big 
difference between quantum and classical mechanics, where absolutely no 
such analogue exists. If the system is in a superposition of states as in (3.1), 
then the probability of each possible outcome of the measurement A (i.e. of 
each possible eigenvalue) is given by |wip. An unobserved quantum system 
is governed by Schrodinger’s equation 

where h = 1.0545- 10“^'^Js is Planck’s constant and H{t) is the Hamiltonian 
(unitary operator) related to the total energy of the system; so the system 
behaves smoothly until it is measured. 

Definition 3.1. A qubit (quantum hit) is a quantum 2-state system 

|^) = a|0)+/3|l), (3.2) 

where a,P G(T such that |ap + |/3p = 1. (Note that |0) and |1) are just names 
for the eigenvectors representing a classical bit and have nothing to do with 
the zero vector in the Hilbert space H =(T^.) 

It is an easy exercise to show that for \ip) as in (3.2) there exist angles 6, </> 
such that 

If/j) = cos0|O) + e*‘^sin0|l). (3.3) 

So a (single) qubit can, geometrically, be interpreted as a point on the two- 
dimensional unit sphere, the north pole (e.g.) representing the eigenstate |1) 
and the south pole the eigenstate |0). It turns out that information that 
in classical computers use much memory can be stored with much fewer 
qubits in quantum computers. Let us begin with a simple example: Assume 
we have two classical complementary bitstrings of length 7, e.g., jOllOlOl) 
and 1 1001010). In order to store them in a classical computer, we need two 
registers each of length 7 bit. However, on a quantum computer, only one 
register of 7 qubits suffices, since here we can just store the superposition 
^(|0110101)-|-|1001010)). More generally, if we have an exponential number 
of bits to store, by using the superposition principle, a polynomial number of 
qubits suffices. (Of course, these superpositions can be very complicated in 
general.) An n-qubit memory register is realized by the n-fold tensor product 
of the 1-qubit register. However, quantum evolvement of such a register may 
lead to states that are defined as a whole, but do not arise from individual 
qubits, i.e., the individual qubits are not defined as such. Such states are 
called entangled (Schrddinger used the german word ” verschrankt” for it). A 
very important fact is that measurements of subsets of qubits in an n-qubit 
register project out the state of the whole register into a subset of eigenstates 
consistent with the answers (eigenvalues) obtained from the measurement. 
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This has to do with quantum teleportation and the Einstein-Podolsky-Rosen 
experiment. Another aspect is quantum parallelism. Quantum evolution is 
performed by unitary operators (on ’’single processors”), which operate at 
the same time on all possible states. These phenomena will be crucial in 
Shor’s algorithm. 



3.3 Continued Fractions 



A regular continued fraction is an expression of the type 

1 



Oo 



1 



(3.4) 



ai 



1 



02 



03 



04 . . . 

with finitely or infinitely many members ak & IN (k G IN), oq G For 
graphical simplicity, it is also customary to write (3.4) as 

[ 00 , 01 , 02 , 03 , 04 ,...]. (3.5) 

For an infinite regular continued fraction (3.5), finite expressions 

[OQ , Ol , . . . , Oy,] 



are called ’’approximating fractions”. Finite regular continued fractions rep- 
resent rational numbers (moreover, the Ofc are uniquely determined if we 
suppose that the last one of them is > 2), whereas for irrational numbers the 
following Theorem 3.1 on ’’continued fraction approximation” is valid. Often 
we will use the notation 

C = [ao,ai, • • •] 

for both finite and infinte continued fractions (i.e. for both ^ rational or 
irrational) . When ^ is rational, then the above will mean that finite continued 
fraction whose last denominatior is > 1. 

Theorem 3.1. Every infinite regular continued fraction [oq, oi, • . •] converges 
to an irrational number fo- Conversely, every irrational number ^0 is the 
limit of a unique regular continued fraction, which is necessarily infinite: 
fo = [oq, ai, . . .]. 

Proof: 1. It is easily verified that one may write 

1 A„ 

, ai , . . . , Q.n\ — 

if one defines the An and Bn by the linear depth 2-recursion A-i := IjB-i := 

:= ciojBq := 1 , 
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A^n ■ — dnAji—l Aji—2i 

Bn ■— G,nBn—l H" Bn— 2 ^ !)• 



(3.6) 



It follows from (3.6) that lim„^oo = lim„^oo = oo. Consider the 
system of equations 



Cn — “t“ 



1 



?n+l 



(n G JVo). 



System (3.7) can also be written as 

^0 ~ [^0; ^1) • ■ • ) ^n— 15 ^n] (^ G .^V), 

which is equivalent to 



^0 - 



(3.7) 

(3.8) 

(3.9) 

7?n — 1 ^n—2^ ^n—2) 

(n 00 ), (3.10) 



= 



^n—l^n “t” 2^71—2 



^n—l^n “t” Bn— 2 

Hence we get 

An-l An-2Bn-l ~ An-lBn-2 (~ 1 )”~^ 



i.e. indeed 



^0 = lim — ^ = [oo,ai, . . .]. 

n— »-oo t>n 



(3.11) 



2. Now we show that every infinite regular continued fraction converges to 
some limit ^o> be., that 

Co := lim 4^ (3.12) 

n— »-oo t>n 

exists. For the difference between two consecutive approximating fractions, 
after some calculations, one obtains the estimate 



■*^n+m— 1 



Am—1 I 



< 



1 



Bn-\-m—l Bn^—l Bm-2B m— 1 



(3.13) 



which shows that the sequence {^}n>i is indeed a Cauchy sequence, i.e. 
convergent to some real number Co- 

3. It remains to prove that the Co £^s it was just defined in 2. is irrational. Put 



Cn ■ — [c^n, C^n+1, ■ • •] (^ G INq^ . 

From (3.9), this yields (similarly as for (3.10)) 



Bn— 1^0 An—1 — 



Bn— 1 An— 2 Bn— 2 An— 1 
Bn—l^n “t“ Bn— 2 



(- 1 ) 



n— 1 



(3.14) 



0 



Bn—l^n “t” Bn— 2 

(n ^ 00 ). (3.15) 
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So, as n ^ oo, the expression ~ c4„_i tends to zero without really 

becoming zero, which is only possible if is irrational (for, otherwise, if 
Co = p/q {p,q G ^), the term - ^n-i| = |S„_ip - A„_iq\/\q\ 

cannot fall below l/\q\ without becoming zero). 

4. Eventually, we show that the approximation of an irrational number by an 
infinite regular continued fraction is unique. Of course we have Cn > oLn for 
all n. Since Co and thus all Cn are irrational, equality is not possible, so we 
have Cn > for all n G Nq. Now if 

Co = [&o, bi , . . .] 



and if we put 



^n+1; • ■ -ji 



then we also have 



Pn—l — [^n — 15 ^n—2i • • •] 



and thus ^ 

Vn—l — [^n— l;^n] — ^n— 1 • (3.16) 

Vn 

Since > &„ > 1, it follows that bn-i is the largest integer contained in Pn-i, 
so in particular, bo is the largest integer contained in 770 , and thus bo = ao- 
But in this case, equation (3.16) yields rji, and by the same reasoning as 
before, it follows that &i = oi, etc. □ 

The following proposition (whose proof is just a short verification) will be of 
importance in the proof of Theorem 3.2: 



Proposition 3.1. If 



and 



then it follows that 



Co — [^0; ai, . . . , Cn] 



Cn — [^n ; ^n+ 1 ; • ■ 5 



Co — [^0 ; , . . . , On , an-\-l ,-.•]• 

Now we are ready to state the result that will be used to develop Shor’s 
algorithm: 

Theorem 3.2. If c,d G Z obey the inequality 

ICo-^K^, (3.17) 

then 2 is an approximating fraction of Co • 
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Proof: Put 




(3.18) 



where <5 = ±1 and then 0 < 9 < ^. Furthermore, we write ^ as a finite 
continued fraction 




[gq 7 ^ 1 ,..., , 



(3.19) 



where we choose n even or odd such that (—1)"“^ = <5. If we define Ak and 
Bk {0 < k < n — 1) as in the proof of Theorem 3.1 (so in particular A„_i = c 
and Bn-i = d) and w by the equation 



An-lOJ + An -2 

Bn-lOJ + B„-2 ’ 



(3.20) 



then this is equivalent to 



so 



Co 



An—l 

B„-i 



or - in other words - 



An—2Bn—l An—lBn—2 
Bn—l{Bn—l^~i~ Bn— 2) 



Bn -1 — 9 Bn -2 „ 

" ° 9g„-. ^ 



(-O'-* 

Bn—l{Bn—l^~i~ Bn— 2^ 

(3.21) 

(3.22) 



Equation (3.20) may be rewritten as 



Co — [oo, oi, . . . , a„_i, w]. 



(3.23) 



Since 0 < ^ we have a; > 1. Now we may develop uj into a regular continued 
fraction 

CO — . [cinj . . .] 

with a„ > 1. By Proposition 3.1 it follows that 



Co — [^0 7^1,..., Cln— 1 , Cln ; • ■ 5 
hence j is an approximating fraction for ^o- 



(3.24) 



3.4 The Algorithm 

First we will show a trick that reduces the determination of the prime fac- 
tors to the calculation of the period of a certain number theoretic function. 
Namely, assume x is coprime to n and define the exponential function (with 
base x) modulo n: 

/„(a) := x°‘{mod.n). 

It is well-known that the sequence {/n(a)}a6_svo is periodic. The length r of 
the period is called ’’the period of x{mod.ny\ Assume r is even. Then (by 
Fermat’s Little Theorem) we have that 
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= l(TOO(i.n) 

and hence 

+ 1) = 0{mod.n). (3.25) 

This means that (unless = ±l{mod.n)), then at least one of the terms 
^r /2 have a nontrivial factor in common with n. (Note that x’’^^ = 1 

would be the case if r/2 were already the period or a multiple of it, that is 
why in the algorithm it will be important to really find the genuine period 
and not a multiple of it.) So as soon as we have determined the period r, we 
have a good chance of finding a factor of n by computing (by the Euclidean 
algorithm, e.g.) the numbers gcd(x’’/^ ± l,n). So our goal must be to deter- 
mine efficiently the period of exponential functions modulo n (where n is the 
number to be factorized). 

Now we are prepared for presenting Shor’s factorization algorithm in detail. 
Let n = pq {p, q prime) be the number to be factorized. 

Shor’s Factorization Algorithm 

- Choose a number d with small prime factors such that 2n^ < d < in? . 

- Choose a random integer x that is coprime to n. 

- Repeat the following steps log d times using the same x every time: 

- Create a quantum memory register of 2d non-negative integers modulo 
n and partition it into two halves called regl and reg2. For the state of 
the whole register we will write the ket vector |regl,reg2). 

- Load regl with the integers 0, 1, . . . , d — 1 and reg2 with zeroes at all 
places, afterwards normalize the register such that we may write (with 
a little abuse of notation) the state of the whole register as ket vector 

.. d-l 

IV’) = 1“’^)- 

- Perform the transformation x x°'{mod.n) (using quantum parallelism) 
on each (non-normalized) number in regl and place the results to the 
corresponding places in reg2. Denote by r the period of the above trans- 
formation. Then the state of the (normalized) complete register becomes 

d-l 

IV”) = ^ X! 1“’ x°-{mod.n)). 

^ a=0 

- Measure the content of reg2 by the Hermitian operator A. Then this 
collapses to some k and has the effect of projecting out the state of regl to 
be a superposition of exactly those values of a for which x“ = k(jnod.n). 
Hence the state of the complete register is 

1^) = ^ E 

^ a'GM 

where M := {a' : = k{mod.n)} . 
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- Compute the discrete (fast) Fourier transform of the projected state in 
regl and put this result back to regl. This maps the projected state in 
regl into a superposition 



d-l 



IV') = 






a'eM h=0 



- Now the Fourier transform in regl is a periodic function peaked at mul- 
tiples of the inverse period 1/r. States corresponding to integer multiples 
of 1/r and those close to them appear with greater probability ampli- 
tudes than those that do not correspond to integer multiples of the in- 
verse period. So in each step, we get a number h' such that is near 
to the multiple ^ of the inverse period of the exponential map for some 
A G IN . In order to estimate A, one can compute the continued fraction 
expansion of as long as the denominator is less that n and then retain 
the closest such fraction as ■^. If this is done sufficiently often, we have 
enough samples Aj that lead to a guess of the true A and thus of r. 

- Now that we know r, we can determine the factors of n (with high proba- 
bility) as demonstated at the beginning of this section. 

We remark that of course (with rather low probability) Shor’s algorithm can 
fail. Such counterexamples are easily constructed. But they represent rather 
untypical cases. 

Furthermore, instead of using the classical (fast) Fourier transform, there 
are also quantum algorithms for the Fourier transform, which make Shor’s 
algorithm working still faster in practice, but not to such an extent that 
the linear order of complexity is even ameliorated. Seifert (2001) suggests 
an approach where, in contrast to the above-mentioned algorithm, he uses 
simultaneous diophantine approximation to reduce considerably the number 
of qubits necessary. 




4 Physical Random-Number Generators 



4.1 Generalities 

The doctrine in cryptology is that the algorithm of encryption is known to 
the adversary (Eve) and that the only thing that is kept secret is the key, 
which normally is a bitsequence or a sequence of natural numbers or el- 
ements of a finite ring (e.g. a residue ring or a finite field). Mostly, such 
key sequences are produced by an algorithmic generator (i.e., they are so- 
called pseudo-random numbers), since these offer the following benefits: the 
sequence of numbers can be reproduced for debugging and testing; no special 
hardware is necessary; a large quantity of random numbers can be produced 
in a short time. In Chapter 7, we will provide several tests of ’’randomness” 
of such pseudo-random sequences. However, there is no practically imple- 
mentable ’’universal” test of randomness: every test procedure just measures 
a certain aspect of ’’non-regularity”. If one wants to have genuine random 
numbers, then they have to be produced by a physical device. A very drastic 
drawback of classical pseudo-random generators has been pointed out in the 
paper entitled ’’Random numbers fall mainly in the planes” by Marsaglia 
(1968). Possible physical random sources are electronic noise produced by a 
semiconducting diode (Richter (1993)) or the impulses of a Geiger counter 
in connection with a radioactive source (Inoue et al. (1983)). In the latter 
paper, the authors propose a hardware implementation of this device, the 
radioactive source consisting of a PG-508 pulse generator. Another device 
using a Geiger counter has been described in Nisley (1990), the RM-60 Micro 
Roentgen Radiation Monitor from Aware Electronics. Finally, there is HOT 
BITS (see Walker (1996)), a source of random bits available via the Internet, 
which uses beta radiation from the decay of Krypton-85. 

The output of such a generator (which in the latter case leads directly to a 
Poisson (for the number of events) resp. exponential (for the inter-occurence 
waiting times) distribution) has to be processed further in order to obtain 
standard uniform random numbers (digits, or reals in [0, 1]). Since the pa- 
rameters of the distribution of the data is not known exactly, only a small 
amount of this information is used (usually the last digit), to be on the safe 
side, and so the yield of this method is relatively small. However, physically 
generated random numbers are expensive and can not be produced in too 
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high quantities. For example, the HOT BITS hardware produces only about 
240 bits per second. 

Modifying an idea of von Neumann (1963), used to extract unbiased bits from 
a sequence of biased ones by comparison of two subsequent bits, we propose 
to obtain random numbers in [0, 1] from a sequence Xq, Xi, X 2 , ... of inde- 
pendent exponentially distributed data by using [/„ := This gives 

us one real number for every two data values instead of only two bits, con- 
siderably increasing the output. If the distribution of the is exponential, 
the Un are uniform in [0, 1]. The question of the ’’rate of disappearing” of 
the bias (so-called extraction rate) is addressed in Section 4.3, in particular 
for rational biases b. It turns out that the size of b does not influence the 
extraction rate, but that the latter is solely determined by the arithmetic 
properties of b. On the other hand, the extraction rate can be shown to be 0 
in Lebesgue-almost all cases. 

In the practical implementation of this method we have to take into account 
that the exponential times can only be measured up to a certain precision. 



4.2 Construction of Uniformly Distributed Random 
Numbers from a Poisson Process 

In this section, we will consider the output of a Geiger counter as source of 
randomness. The other examples mentioned in the previous section are of a 
similar nature. If the number of impulses during a fixed amount of time is 
counted, a variable with a Poisson distribution is the raw material that has 
to be processed further in order to obtain unbiased random bits. Usually, the 
length to of the time interval is chosen large with respect to the mean time 
1 /9 between two impulses; then the number N of counts during this interval 
has a Poisson distribution with A = igd. In most cases, the last digit X in 
the binary representation of N is used as an approximation for a uniformly 
distributed random bit. 

Another method (see Inoue et al. (1983)) makes use of the random waiting 
time T between two consecutive impulses, which obeys an exponential dis- 
tribution eg. Clearly, if the intensity 9 > 0 were known exactly, one could 
obtain a uniformly distributed random variable C/ just by the usual transform 
method U := exp(—9T). But 9 not being known exactly enough to guarantee 
that U is ’’sufficiently” uniform, it has to be estimated. One can use two 
consecutive waiting times produced by the Geiger counter, one so to say to 
estimate 9 and the other one to obtain a uniform random variable. 

The following lemma is easy: 

Lemma 4.1. Let X and Y be independent random variables with common 
exponential distribution eg (where 9 >Q). Then 
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U := 



X 

X + Y 



obeys a uniform law on [0, 1]. 

Therefore if the raw material is a stream of independent exponential random 
times Xn (n G IN), a sequence of independent uniform variables can be 
obtained by setting C/„ := Xinl{Xin + X 2 n+i)- 

Unfortunately the waiting time between two impulses of the Geiger counter 
is not measured as a real number, but only in multiples of the length A of 
the clock cycle (w.l.o.g. we may assume = 1). If two impulses occur during 
one clock cycle, then they are counted as one. Hence the n-th observation of 
an impulse occurs at the time defined recursively by Sq := 0, 



S'^ := min{fc G N : Nk > + 1}, 

where the Poisson process {iVt}t>o^ indicates the number of impulses up 
to time t. Instead of the sequence {Xn}n>i of exactly exponentially dis- 
tributed waiting times between two impulses, we can only observe the se- 
quence {X'j,}n>i, where := - S'^_i. 

Proposition 4.1. The X[,X 2 ,... are i.i.d. such that X'^ — 1 obeys a geo- 
metric distribution with parameter 1 — exp(— 0). 

Proof: Let {iFt}t>o denote the canonical filtration of the Poisson process 
{-/V(}t>o. Then S[ < S '2 < ■ ■ ■ is a, sequence of stopping times. Assume n > 2. 
Since the Poisson process is stationary with independent increments, the 
process {Nl}t>o with Nj. := A^t+s^ ^ 1 again a Poisson process with 

parameter 0. Therefore the distribution of 

X'n = S'„- = mm{k G IN : Nj, > 1} 

is the same as that of X[. The latter law can easily be calculated to be the 
geometric distribution with parameter 

P(X[ -1 = 0) = P{Xi < 1) = 1 - exp(-6»). 

^ A stochastic process {A^t}t>o is called a Poisson process with intensity A > 0 if 
Nt obeys a Poisson distribution with parameter Xt (t > 0). This is equivalent to 
the fact that for the ’’jump times” 

Jo = 0 < Ti < /2 < . . • 

(where Fk ;= inf{t > 0 : At > fc}) we have that the ’’inter-occurence times” 
Fk+i — Fk are i.i.d exponentially distributed random variables as 

P{Fk+i -Fk>x) = 



for a: > 0. 
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Furthermore, the process {iV(}t>o and hence the random variable X'^ is inde- 
pendent of - On the other hand, the random variables X^,X 2 , . . . , X^_^ 
are Xs' ^-measurable and hence independent of X^. □ 

Denote by -Fx the distribution function of a random variable X. 

Theorem 4.1. Let X' — 1,Y' — 1 be independent geometric random vari- 
ables with parameter O' = 1 — exp(— 0) and denote by U a random variable 
distributed uniformly on the interval [0,1]. Then U' := satisfies 



itanh(^) < \ \Fu> - Fjyjjoo < 1 -exp(-^). 

Proof: The lower bound follows from the observation that Fu is continuous 
at 5 whereas Fu> has a jump of size P{lF = 5 } = ~ n}P{Y' = 

»} = E” = iStH = ta„h(f). 

We will now assume w.l.o.g. that X' and Y' are of the form X' = [XJ and 
Y' = [yj with X, Y independent and with exponential distribution with 
parameter 6. Let U := Xj(X F Y). Since the distribution of U' is symmetric 
about ^ it is easy to see that for the upper bound it is sufficient to show 
\Fu'{t) — < 1 — exp(— 0/2) only for t g] 0, ^]. For such t we have 



Fu'(i)-i = B(l|o.,l(C')-llo..](C)) 



m,nei]N 



*l{m<X<m+l, n<Y <n+l}^ 



= S'+-S'_, 



where 






S+ = ^ P{m < X < m + 1, n <Y < n+1, ^ 



m+0.5 m + 1 V.. 

m + n + 1 m + n + 1 



and 



S- = ^ P{m <X<TO-|-1, n <Y < n + 1, — — — < t). 



X + Y 



n + n + l >*i m + ii + l 



The last equality follows from the fact that the random variable in the ex- 
pectation takes only the values — 1 , 0 , or 1 , and all summands with either 
m+n +1 E ^ or > t vanish, since the square ]m, m + l[x]n, n -I- 1[ lies 

completely on one side of the line = t} in these cases. 

We now collect the summands in that belong to the same m. Let 
a := (1 — t)/t > 1 ; then > t if and only if ax > y. 
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From this we obtain 

OO 

S+=J2 <m+l, Y>rim, aX>Y) 

m— 0 

and 

OO 

S- = Y^ P{X >m, Y < Um, aX < Y), 

m— 0 

where Um is the smallest n G IN with < t. Considering a single 

summand we have 

P{X > m, Y < Um, aX < Y) 

< P{m < X < m + (1/2), am <Y< Um) 

= c • P{m < X < TO + 1, am < F < nm) 

with 

c := P(m < X < m + (1/2))/P(to < X < m + 1) = (l + exp(— ^)) 

In order to prove the latter inequality we have used the fact that the density 
of P is a decreasing function oi x + y and an elementary geometric argument 
in the to — rim-plane. A similar argument yields 

P{X < TO+1, Y > Um, aX >Y) < c-P{m < X < m+1, Um < F < a(TO+l)). 
Summing up, we obtain 

OO 

S+ + S- < c • P(to < X < TO + 1, am <Y< nm) 

+ c • P{m < X < TO + 1, Um < F < a{m + 1)) 

OO 

= c • ^ P(to < X < TO + 1, am <Y< a{m + 1)) 

m—0 

OO 

= c • ^ exp(— 0(a + 1 )to)( 1 — exp(— 0))(1 — exp(— o0)) 

m—0 

(1 — exp(— d))(l — exp(— O0)) 

(1 + exp(-0/2))(l - exp(-(a + 1)0)) ’ 

But since 1 — exp(— o0) < 1 — exp(— (a + 1)0)), we finally get the bound 
\Fm{t) -t\ = \S+-S.\<S+ + S.<l- exp(-^), 
and this proves Theorem 4.1. □ 

The upper and lower bounds 1 — exp ^ < | and 1 tanh | | in Theorem 

4.1 differ by a factor of approximately 2. As one can see from numerical 
experiments, the lower of these is the true value, but the proof of this fact is 
more complicated. 
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4.3 *The Extraction Rate for Biased Random Bits 

In this section, it will make things a little simpler (e.g., as we will see, we 
can work with expectations) if we replace IB = {0, 1} by B := {1, —1}. Since 
there will be no danger of misunderstanding, also the elements of B will be 
called (random) ’’bits”. 

We want to investigate the following question: Given n i.i.d. random bits with 
common bias b (i.e. P{Xi = l) — P{Xi = —1) = E{Xi) = h g] 0, 1[ (w.l.o.g.)), 
how is it possible to construct from them an ” as unbiased as possible” random 
bit? It turns out that a good method is to multiply^ the Xi G B, for if 

n 

Pn 

i=l 

then the bias of P„ turns out to be only i.e. P{Pn = l)—P{Pn = —1) = &"• 
One may ask if there are functions f : B" ^ B that behave better (in the 
sense of bias reduction) than multiplication. Let us define, for / and b as 
defined before, the quantity 

0.„(&) := |i?(/(Xi,X2,...,X„))| 

and 

Sn{b) := min ^f,„{b). 

The relation C,n(b) = (as mentioned before) can be interpreted as fol- 
lows: For each new (independent ^-biased) bit source X„+i combined with 
the sources Xi, X 2 , ■ ■ ■ , Xn, the multiplication-function ’’extracts” another 
factor b in the output bit Pn+i (compared with Pn). So if we replace the 
multiplication- function by an (asymptotically (as n — > 00 )) optimal function 
/, we should have at least the extra multiplicative factor b for every step 
n -I- n+1 (i.e. by taking one additional bit source). Therefore, we define the 
so-called extraction rate of b by 

S{b) := lim 

n — >-oo 

The extraction rate can be interpreted as the optimal asymptotic multiplica- 
tive effect of each new input bit source on the resulting bias of the output 
bit. Or - in other words - it is the asymptotical (as n ^ 00 ) speed of the 
diminution of the bias per new random bit source, when the final output 
bit is produced by adding (mod. 2) (in IB) n independent biased random 
bit sources. It can be shown that for Lebesgue-almost all b g] 0, 1[ we have 
E{b) = 0 (see Naslund, Russell (2001), Theorem 21). For rational b we have 
the following: 

^ If we identify B and IB in the natural way, then multiplication in B corresponds 
to addition mod. 2 in IB. 
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Theorem 4.2. IfbGQ,b=^,r,sG IN, r, s relatively prime, then r; (6) = i. 

So interestingly enough, it is not the size of b, but rather its arithmetic 
properties that determine its extraction rate! 

Proof of Theorem 4.2: 1. We first prove that 

> N (4^1) 

Let us fix some notation. For a subset C C define its weight by 
w{C) :=P((Xi,X2,...,X„) G C), 



and put 



/(Xi,X2,...,X„) :=2(l(C)(Xi,X2,...,X„)-i). 

Now consider a collection (subset) C C ,8" with w{C) = where |5| is 
the bias of /. W.l.o.g. we may suppose that (— 1, — 1, . . . , — 1) ^ C. Then we 
may calculate 



w{C) 



1 + 5 
2 

1 "" 

— Y.^^r\s-r) 
2=1 



for some integers G {0, 1, or - equivalently - 



n 

s”(l + 5) = 2'^ty{s-r)^-\ 
i=l 

Since <5 yf 0 and 6 > 0 we have that r > 1 is a divisor of the right-hand side of 
the above equality. Furthermore, we have supposed that r and s are relatively 
prime. So the left-hand side must be an integer (since the right-hand side is) 
and inequality (4.1) follows. 

2. Now we turn to the other direction. We will construct a family of functions 
fn-B^^B with the property that 

iy|if(/„(Xi,X2,...,X„))|^i. (4.2) 

s 

For this, we will prove the following lemma, which is also of some independent 
interest. Then (4.1) and (4.2) will yield the result of Theorem 4.2. □ 
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Lemma 4.2. If b is as in Theorem 4-2, we have that 

m < -• 

s 

More precisely, for n > 2r + 1 we obtain 



Sn{b) < 



2r(s — r)^ 
s” 



and there exists a (deterministic) polynomial-time algorithm for finding an 
optimal f , such that 






2r{s — r)^ 
s" 



Proof: Define q := s — r, so that we have | ^ = 1. Since we have supposed 

& > 0, it follows that r > q. Let be the t-th level of S”, i.e. those elements 
of with Hamming weigth (number of ones) i. Let P'("'\b) denote the 

fn) 

probability that an element of B^ is equal to some fixed element oi x & B\ . 
This probability is indeed independent of the specific x and given by 






Hence in our case, we have 



We want to find collections C„ C B^ such that s^w{Cn) is ’’close” to 
Then for the function 

we will have that E{fn{Xi, X 2 , ■ ■ ■ , Xn)) will be close to 0. 

For this construction we proceed as follows: Define an initial collection 

:= si") U Si"_)i U Si"j2 U T, 

where T is a maximal subset of which |S^)”)\T| > r — 1 

(for 1 < j < n — 3) and s^w{Cn) < Now let us adjust this collection 
suitably to bring its weight (multiplied by s”) closer to Since r > q and 
Pi^\b) < Pj"'\b) (for i < j) and by the maximality of T we get 

\s-w{C^) - LyJ I < s-Pi%{b) = r-~V- 

Now consider the cyclic group ^^n- 2^2 and denote hy tt : Z ^ ^^n- 2^2 the 
canonical projection. Since r and q are relatively prime, it follows that for 
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every i > 2, the element 7 r(r"“*< 7 *) has order in so that we 

obtain the following chain (or ’’tower”) of subgroups of Zj.n- 2 ^ 2 -. 

0 = (^(r”-2g2)) c (^(r”-3g3)) c . . . C (^(rg”-i)) 

((x) denotes the cyclic subgroup generated by x). All the groups in the above 
chain have index r in the next one and the last one (( 7 r(rg”“^))) has index 
rq^ in Z^n- 2 q 2 . Thus the group {'K{rq^~^)) can be used to approximate every 
element of 'Z^n- 2^2 to within an additive error of rq^ . In particular for A := 
7 t([^J — s”tu(C„)), there exists an element Z\' S ( 7 r(rq'”“^)) such that 

A- a: & {7t(0), 7t(1), . . . , Ti{rq^ - 1)}. 

On the other hand, we of course may write A' =: cK{rq^~^), so by well-known 
algebraic facts (see Naslund, Russell (2001), p. 308) one has an equation 

n— 3 

Z\' = ^ ti7T(r*g"-3 G (^(rg"-i)) 

i=l 

with integers ti G {0, 1, . . . , r — 1}. As r > q, we may ’’lift” this equation to 

n n—Z 

LyJ = ^ + w{Cn)s^ + E 

i^l 

(where m < nr and ^ G {0, 1? ■ • ■ 5 ~ 1} represents the error term). Now, if 

(n) 

we add U elements of to (7„ and, on the other hand, remove m elements 
of b 1^22 from Cn (which is possible as long as m < („” 2 )> ^ ^ 

indeed obtain a new collection Cn with 



„n 

^"MCn) - LyJ < 



Dividing this equation by s" yields 



En(b) < 



2rq^ 

s” 



and the result follows (since each step of the above-described algorithm can 
be carried out in polynomial time).D 




5 Pseudo-random Number Generators 



5.1 Linear Feedback Shift Registers 

In contrast to Chapter 4, where we discussed genuine physical random num- 
ber generators, here we will deal with so-called pseudo-random number gen- 
erators. These are determinsitic algorithms that produce an output which 
behaves ’’more or less” like random numbers. The advantage is that like that, 
much more data can be generated per time unit than with physical devices. 
On the other hand, pseudo-random numbers never have the quality of gen- 
uine random numbers. There is a definition of ’’perfect pseudo-randomness” 
in the sense that, loosely speaking, a source is perfectly pseudo-random if it 
can not ’’efficiently” be distinguished from genuine random numbers. We will 
deal with that in more detail in Section 5.3. However, this test is not practi- 
cally implement able. In reality, one can only test for finitely many necessary 
conditions for a source to be considered as ’’sufficiently random”. Normally, 
one tries to generate a uniform random variable U on the interval [0, 1[ (more 
precisely: one approximates C/ by a discrete uniform distribution on the set 
{^:0<fc<iV— 1}, where N is chosen sufficiently large). 

In the sequel, w.l.o.g. we will interpret (pseudo-)random numbers as bitse- 
quences {xi}i>o = {xqXi . . .} (x* G IB = {0, 1}). 

A catalog of some ’’minimal” requests for pseudo-random generators was es- 
tablished 1967 by S. Golomb. For this, we need some preparation. As a com- 
puter has only finitely many memory places, every pseudo-random sequence 
generated by a computer is eventually periodic. In the sequel, we consider 
only periodic pseudo-random number generators. Let p be the period of the 
pseudo-random number generator, i.e., the smallest natural number with the 
property Xi+p = Xi (Vi > 0). A run is here, by definition, a sequence of 
identical elements of IB; we will speak of a block, resp. a gap, if it is a run 
of ones, resp. zeroes. Let A{k) resp. D{k) be the number of matches, resp. 
non-matches, of {xi}i>o and {xi+k\i>o counted over a whole period: 

A{k) := |{i G {0, 1, . . . ,p - 1} : Xi = Xi+fc}|, 



S{k) := |{i G {0, 1, . . . ,p - 1} : Xi yf Xi+fc}|. 



D. Neuenschwander: Prob. and Stat. Methods in Cryptology, LNCS 3028, pp. 57-75, 2004. 
© Springer- Verlag Berlin Heidelberg 2004 
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The autocorrelation AC{k) of the periodic sequence {xi}i>o is defined as 

AC(k) = dtthLffi. 

P 

If fc is a multiple of the period length p, then one speaks of ” in-phase” - 
autorcorrelation; in this case we always have AC{k) = 1. In the other case, 
AC{k) is called ” out-of-phase” -autocorrelation; it lies always between —1 and 
1. Now Golomb’s conditions are the following: 

(Gl) The number of zeroes and the number of ones per period are p/2 (if p 
is even) and (p± l)/2 (if p is odd) (i.e. zeroes and ones appear with approx- 
imately the same probability). 

(G2) In a cycle, half of the runs have length 1, a quarter of them length 2, 
an eighth length 3, .... Half of the runs of a certain length are blocks, the 
other half gaps. (This condition says that, e.g., after 01, the zero again has 
the same probability as the one, etc.) 

(G3) The out-of-phase-autocorrelation AC{k) has the same value for all k. 
(This can be interpreted as follows: If one counts the number of matches 
between a sequence and its shift by k places, one does not obtain any in- 
formation about the period p of this sequence (with the exception if fc is a 
multiple of p).) 

We now consider bitsequences that are generated by so-called linear feedback 
shift registers (LFSR) . The advantage of LFSR is that they are very easily 
implementable in hardware and work very fastly. An LFSR of length n has 
a vector of n places of memory. At the beginning, the initial state vector 
(xo,xi, . . . ,Xn-i) G IB” is stored. The most important part of an LFSR is 
the so-called (linear) feedback function: 

n— 1 

f : B (xo,Xi, . . . ,Xn-l) ^ /(xo,Xi, . . . ,Xn-l) = ^ CiXi G B, 

i^O 

where cq, ci, . . . , c„_i G IB are fixed (built-in) parameters. After the first 
step, the LFSR will give the leftmost bit xq as output, delete this bit from 
its memory place, shift the contents of all the other memory places one place 
to the left, and put the value x„ = /(xq, xi, . . . , x„_i) in the rightmost 
memory place. (We note that instead of bits, one can work with elements 
of an arbitrary finite ring. Then the classical linear congruence generators 
are just shift registers of length n = 1 over a residue ring.) W.l.o.g. we may 
assume cq = 1, for otherwise one could replace the LFSR by a LFSR of length 
n — 1. An output sequence of an LFSR will be called a pseudo-noise sequence 
(PN-sequence) if it has the (maximal possible) period p = 2" — 1. Since a 
LFSR generating a PN-sequence must assume all its possible states and since 
the output sequence is uniquely determined by the initial state, PN-sequences 
are automatically periodic. In the sequel, we want to investigate which LFSR 
generate PN-sequences. For this, we introduce the important notion of the 
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characteristic polynomial (or recursion polynomial) of a LFSR: If one puts 
c„ := 1, then the polynomial 

n 

i=0 

is called the characteristic polynomial (or recursion polynomial) of the LFSR. 
With 0{f) we denote the set of all possible output sequences of the LFSR 
with characteristic polynomial /: 

n—1 

■ ^k+n — ^ ^ ^i^k+i (^ ^ ^)}* 
i=0 

One sees easily that 0{f) is a vector space of dimension n over the field IB. If 
f(^) — Sr=o denotes a polynomial with coefficients in IB, then we define 
the corresponding dual polynomial f*{z) by 

n 

r{z) = z-f{i/z) = Y^ciz'^-\ 

i=0 

Clearly, f**{z) = f{z) and {f-g)*{z) = f*{z)- g*{z). For the output sequence 
{xi}i>o we consider the generating function 

OO 

s{z) = 

i=0 

(interpreted as formal power series). 

Lemma 5.1. If one puts 



then it follows that 



Proof: 



n-l j 

t { z ) ■■= ^(^c„_^a;j_^)z^ 
j=0 £=0 



S{z) 



t(z) 

f*{z)' 



OO n 

= (^Xkz'^)C^Cn-lZ^) 
fc =0 t =0 

OO niin{j,n} 

= ^ ' ( ^ ' Cn-lXj-t)z^ 

j=0 i=0 

n—1 j 71 

j —0 £—0 ^—0 



s{z)r{z) 
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n 

= t { z ) + CjXQ_„)+i)2:* 

j>n i—0 

= t { z ).0 

As |0(/)| = 2" and since there are exactly 2" polynomials of degree < n 
over IB, one obtains (by identifying the output sequence with its generating 
function) the following: 

Corollary 5.1. 

^if) = ■ degT-(^) < n}. 

Lemma 5.2. Suppose {xi}i>o G 6>(/), {yi}i>o G 0{g). Then 



{xi + yi}i>o G 0{lcm{f,g)). 



Proof: Based on Corollary 5.1, we write S{z) = a{z)/f*{z), T{z) = 
P{z)/g*{z) (where dega(z) < deg/(z), deg f3{z) < deg g{z)). Furthermore, 
we put h = lcm(/, g) and define the polynomials u and v by h = u • f and 
h = V ■ g. As 



S{z) + T{z) 



«(^) , Pjz) 

f*{z) g*{z) 

(a(z)u*(z) + P(z)v*(z))/h*(z), 



and, on the other hand, a{z)u*{z) and j3{z)v*{z) both have lower degree than 
h{z), it follows that S{z) + T{z) G 0{h). □ 

From the theory of finite fields, the following is known: 

Lemma 5.3. (i) For every polynomial f{z) with /(O) = 1 there exists an 
m G IN such that f{z) is a divisor of -I- 1. The smallest such m is called 
the period of the polynomial f{z). 

(ii) If n := deg f{z) and if f{z) is irreducible, then the period of f{z) divides 
2" — 1. If the irreducible polynomial f(z) has the (maximal) period 2" — 1, 
then the polynomial f{z) is called primitive. 

(Hi) The number of primitive irreducible polynomials of degree n is given by 
(/?(2" — l)/n (if denoting the Euler totient function). 

Lemma 5.4. If the polynomial f{z) has period m and degree n and if 
{xi}i>o G 0{f), then the period of {xi}i>o divides m. 

Proof: Let g{z) be a polynomial such that 



z^ + l = f{z) • g{z) 



(5.1) 



and with degree m — n. If on both sides of (5.1) one passes to the dual 
polynomial, one obtains 
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z^ + l = r{z)-g*{z). 

By Lemma 5.1 there is a polynomial t{z) of degree < n such that 



^ r{z)-9*{z) 

1 + z™ 

= t(z)-5*(z)-(1 + z’” + z2™ + ...). 



Since deg g*{z) = m — n, it follows that deg (r(z) • g*{z)) < m. So the period 
of S{z) divides m. □ 



Lemma 5.5. If the irreducible polynomial f{z) has period m and degree n 
and if {xi}i>o € 0{f), then {xi}i>o has period m. 

Proof: Let p denote the period of {xi}i>Q. By Lemma 5.4, p divides m. Hence 



S{z) 



u{z) 

1 + zP 



(5.2) 



for a suitable polynomial u{z) of degree < p. One the other hand, due to 
Lemma 5.1 we have 



S'(z) 






(5.3) 



Comparing (5.2) and (5.3) yields 



(1 + zP) ■ t{z) = u{z) ■ f*{z) 



and hence (by passing to the dual polynomial on both sides) 



(1 + zP) ■ T*{z) = U*(z) ■ f(z). 



As f{z) is irreducible and t*{z) has degree < n, it follows that f{z) is a 
divisor of zP + 1. Since f{z) has period m, we obtain that m divides p. But 
as p is also a divisor of m (as seen before), the assertion follows. □ 

Lemma 5.6. If f{z) is a polynomial of degree n and if {xi}i>o € 0{f) is a 
PN-sequence, then f{z) is irreducible. 

Proof: From the theory of factorization in rings (here applied to rings of 
polynomials), it follows that there exists an irreducible polynomial fi{z) with 
positive degree ni and a polynomial f 2 {z) such that f{z) = fi{z) ■ f 2 {z). By 
Corollary 5.1 we have that l//*(z) G 0(/i), so by Lemmas 5.3(h) and 5.4, the 
period of !//(“ (z) divides 2”^ — 1 . On the other hand 1/f* (z) = f^iz)/ f* (z) G 
0{f), so 1//C(z) must be a shift of {xi}i>o and thus have period 2” — 1. It 
follows that n = ni, thus /(z) = /i(z). □ 
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So by Lemmas 5.5 and 5.6, we obtain the following theorem: 

Theorem 5.1. The output sequence of a LFSR is a PN-sequence iff the char- 
acteristic polynomial is primitive. 

Due to Lemma 5.3(iii), there are thus exactly — l)/n. different LFSR of 
length n that generate PN-sequences. 

One can show that LFSR that generate PN-sequences satisfy Golomb’s con- 
ditions (G1)-(G3): 

(Gl): Since every state occurs exactly once per period and since the leftmost 
bit always yields the next output bit, it follows that the number of ones, resp. 
zeroes, per period is 2"“^, resp. 2"“^ — 1. 

(G2): There are states whose leftmost k -\- 2 bits have the form 

Oil ... 10, resp. 100 . . .01. So gaps and blocks of length k < n — 2 occur 
exactly times per period. The state Oil ... 1 occurs exactly once. Its 

successor state is 11 ... 1, after which the state 11 ... 10 follows. Hence there 
is no block of length n — 1 and 1 block of length n. By analogy, there exists 
1 gap of length n — 1 and no gap of length n. 

(G3): If {xi}i>o G 6*(/)j then also {xi+k}i>o G 0{f) and thus (since 0{f) is 
a vector space) {xi-\-Xi+k}i>o G 6>(/). The number of matches per period be- 
tween {xi\i>Q and {xi+fc}i>o is equal to the number of zeroes of {xi-\-Xi+k\i>Ci 
per period, which by (Gl) has the value 2"“^ — 1. By analogy, the number of 
non-matches is 2"“^. So the out-of-phase-autocorrelation assumes the value 

= (l<fc<2"-l). 

Of course, every finite bitsequence can be produced by an LFSR. The length 
of the shortest such LFSR can be determined by the Berlekamp-Massey algo- 
rithm (see Section 7.11) and is called the linear complexity of the bitsequence. 
Non-linear filtering of PN-sequences can lead to high linear complexity (see 
Kalouptsidis, Kolokotronis (2003)). 



5.2 The Shrinking and Self-shrinking Generators 

The shrinking generator consists of two LFSR over GF{2), an LFSR a = 
(a(0), a(l), . . .) and a second one (called the selector) s = (s(0), s(l), . . .). 
Now the output of the generator will be the x-sequence, which is a ’’shrunken” 
version of the a-sequence, in the sense that the element a{i) will be included 
in the x-sequence if s{i) = 1, otherwise it will be discarded. In other (more 
formal) words: 

x{k) := a{ik), 

where ik denotes the position of the fc-th 1 in the selector sequence s. The 
shrinking generator is easy to implement and has, as we will see, good statisti- 
cal properties. First, we will investigate the period and the linear complexity 
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of the x-sequence. Let Ta, resp. |a|, denote the period, resp. length of the 
LFSR a (and analogously for s and x). 

Theorem 5.2. If a and s have primitive characteristic polynomials and ifTa 
and Tg are relatively prime, then 

= (2l“l - 1)21'*!"^ 

Proof: W.l.o.g. we may assume that 

|a|>log2|s|. (5.4) 

Since the s-sequence has (due to the primitivity of its characteristic polyno- 
mial) elements 1 in a full period, one observes that 

x{i + = a{ki + jTg). (5.5) 

Furthermore, if for any indexes k, k' we have that a{k + jTg) = a{k' + jTg) 
for all j, then it follows that 

Ta\k-k'. (5.6) 

[Since the characteristic polynomial of a is primitive and since Ta and Tg are 
relatively prime, it follows that the characteristic polynomial of the sequence 
{a{k + jTg}j>Q is also primitive, hence this sequence also has period Ta-] 
Clearly, we have 

T,|r,2i^i-b 

Since x{i + j2l®l“^) = x{i + + j2l®l“^) for all i and j, together with (5.5) 

and (5.6) we obtain that 

Ta\k,+T,,-k, (5.7) 

for all i. Or - in other words - for every i there exists a ji such that 

ki+T,, = ki + jiTa. (5.8) 

Replacing f by i -I- 1 in (5.8) yields 

ki+i+Ta, = ki+i + ji^iTa- (5.9) 

Now we subtract (5.8) from (5.9), giving 

fci+T^ + l - = ki+i - ki+ {ji+i - ji)Ta (5.10) 

for all i. On the one hand fei+r* and but on the other hand, ki and 

ki+i are also positions of consecutive ones in the s-sequence. So if ji+i—ji 0, 
we would have at least Ta consecutive zeros somewhere in the s-sequence, 
which by assumption (5.4) has been ruled out. So ji+i = ji and hence 



ki+T,,+i — ki+T„ = ki+i — ki 



( 5 . 11 ) 
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for all i, which yields that the subsequences of s starting at the elements 
s(fci), resp. s{ki + T^), are identical. This is only possible if — ki, 

hence the number of elements in the s-sequence between s{ki) and s{ki + Tx) 
is a multiple of the period of s. However, then the number of ones in 

this segment is a multiple of But on the other hand, this number is 

also Tx, so there exists a t G IN such that 

= (5.12) 

Relation (5.5) implies 

a(fco) = x(0) = x{jTx) = x(jt2l®l“^) = a(fco + jiT^) (5.13) 

for all j. Thus Ta\tTg and hence (since Ta and Tg are supposed to be relatively 
prime) Ta\t, which, by (5.12), entails that Ta2^‘^^~^\Tx- □ 

For the linear complexity L of the x-sequence, we get the following estimate: 

Theorem 5.3. Under the hypotheses of Theorem 5.2, we have 

|a|2l"l-2 <L< |a|2l"l-b 



Proof: 1. Upper bound for L\ In order to find an upper bound for L, we 
want to look for a polynomial p{.) such that (by a little abuse of notation) 
p{z) = 0 for all possible outcomes of the sequence x (i.e., the coefficients 
of p{z) = X)fc=o represent the linear relation satisfied by the 

elements of the x-sequence) . Let X[s] denote the sequence {x(j2l'*l“^)}j>o. 
From (5.5), the elements of this sequence are all of the form a{i + jTg). 
By the hypothesis that Ta and Tg are relatively prime, the sequence just 
described must have the same linear complexity as the original a-sequence, 
so it has to satisfy a polynomial equation Q{.) = 0 of degree |a|. But then 
also the sequence has to satisfy this equation, i.e., Q(x[s]) = 0 (with a 

little abuse of notation). Now define P{z) := Q{z'^^‘^ ^). The polynomial P 
satisfies P{z) = 0 and has degree |a|2l'*l“^, which is an upper bound for L. 

2. Lower bound: Denote by M(z) the minimal polynomial for the sequence 
X. Since Q(x[s]) = 0, it follows that the polynomial M{z) is a divisor of the 
polynomial 

Q{xig]) = Q{z^'‘'~') = Q{zf‘'~\ 



hence 

M(z) = Q{zY 



for some t < Now assume that the lower bound asserted in Theorem 

5.3 is not true and let t < 2l®l“^. Then M{z) divides Q{z)‘^'‘' ^ . Since Q{z) 
is an irreducible polynomial of degree |a|, it divides the polynomial 1 -I- 2 ;^“, 
so it follows that the polynomial M (z) divides the polynomial 
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which entails that the period of the x-sequence can be at most But 

this is a contradiction to Theorem 5.2, hence we must indeed have t > 

□ 

The next theorem, which we state without proof, gives statistical properties 
of the shrinking generator. The general assertion is that one can show that 
the distribution of the output sequence of a shrinking generator is ’’near” 
to the distribution of a genuine unbiased random sequence in the following 
sense: 

Theorem 5.4. Consider a shrinking generator as above. Denote by U a gen- 
uine unbiased random sequence of length n. Let b G {0, 1, *}" be any template. 
Then 

\E(templateb{X^^^) — E{templateb{U)\ = 0 (^ 77 )- 

21^1 

Furthermore, assume X(i) and X( 2 ) are two elements of the x-sequence with 
distance Then the correlation between X(i) and X( 2 ) bounded by 

(See Coppersmith et al. (1994), Theorem 13 and Corollary 14). 

A concept related to the shrinking generator is the so-called self-shrinking 
generator. There, one works with only one LFSR and consecutive (non- 
overlapping) pairs of its output bits. If the first bit of the pair is a 1, then 
the second bit of the pair is included in the x-sequence (output) of the self- 
shrinking generator, otherwise the pair is dicar ded. For more information 
about the self-shrinking generator see Meier, Staffelbach (1995) and Black- 
burn (1999). In the latter paper, the maximum linear complexity conjectured 
by Meier and Staffelbach (1995) is proven. 



5.3 Perfect Pseudo-randomness 

In this section, we give a definition of so-called ’’perfect” pseudo-randomness. 
Loosely speaking, this means a pseudo-random source that can not ’’effi- 
ciently” be distinguished by a computer from a truly random sequence. How- 
ever, the test for perfect pseudo-randomness is not practically implement able. 
For a function /(n) (n G IN) we will write /(n) = 0(n(n)) if /(n) = 
0{l/g{n)) {n 00 ) for every polynomial g{z). In this case, we will say that 
the function /(n) is negligible. A model M is called a perfect simulation of 
a source S if for every probabilistic polynomial-time algorithm D : IB^ —>■ IB 
we have 

\Ps{D = 1) - Pm{D = 1)1 = 0{iy{n)). 

This means that no probabilistic polynomial algorithm can distinguish S 
from M with non-negligible probability, or, in other words, that S and M 
are polynomially indistinguishable. If D did not satisfy the above inequality, 
then we would say that U is a distinguishing algorithm. The following theo- 
rem states that the so-called Comparative Next Bit Test is a test of perfect 
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pseudo-randomness. However, this test is of asymptotic nature and involves 
a formulation of type ” for every polynomial-time algorithm” , so it is only of 
theoretical value, since there are infinitely many such algorithms. But even 
in theory, up to now it is not yet known if perfect pseudo-random generators 
actually exist! 

Definition 5.1. A source S passes the Comparative Next Bit Test with re- 
spect to a model M if, for every i € {l,2,...,n} and every probabilistic 
polynomial-time algorithm A : IB, we have that 

= Xi) - =Xi)\ = 0{n{n)). 

Theorem 5.5. A model M is a perfect simulation of a source S iff S passes 
the Comparative Next Bit Test with respect to M . 

Proof: The ’’only if’-direction is easy to see by contraposition. What is more 
difficult is the ” if’ -direction which we will verify in the following. Suppose 
S is not a perfect simulation of M. We have to prove that S does not pass 
the Comparative Next Bit Test with respect to M. Let D : IB^ ^ IB he a 
distinguishing algorithm, i.e., 

\Ps{D{x^^^) = 1 ) - Pm{D{x^^^) = 1)1 > 



for some constant exponent k. Let pf denote the probability that the al- 
gorithm D gives 1 as output when the first i bits of its input are taken 
out of the source S and the rest are i.i.d. unbiased random bits. By re- 
placing S' by M in the above sentence, we define pf^ analogously. Con- 
sider the difference di := pf — pf^ ■ It holds that pf = Ps{D{x^^'>) = 1), 
Pn = Pm{D{x''^'>) = 1), pf = pf^ = Pu{D{x^'^^) = 1) (where U means a 
source of genuine independent i.i.d. unbiased random bits). Thus as do = 0 
and |d„| = \pf—pff \ > there must be an f such that |di—di_i| 

W.l.o.g. di > 0. The Comparative Next Bit Test A inputs in D the concate- 
nated bitstring (where G S or M and is a 

bitstring generated by running the source U n-i-\-l times). The output will 
be Xi if D(x^”^) = 1 and 1 — Xi else. Now let xi, X 2 , . . . , Xi be bits produced by 
S or M and let resp. be the probability that the distinguisher D yields 
1 as output when bits number 1, 2 , . . . , t — 1 are given by xi, X 2 , . . . , Xi_i, bit 
number i is 1 — Xi, and the rest are independent i.i.d. unbiased random bits. 



Then we have 
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and thus 
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The property that the Comparative Next Bit Test checks can be called un- 
predictability, more precisely forwards unpredictability. Since the property 
of a pseudorandom number generator to be perfectly pseudorandom or not 
does not change if the output bits are taken in reverse order, forwards un- 
predictability is equivalent to backwards unpredictability. 

A permutation /(.) is called one-way, if its result can be calculated in poly- 
nomial time, but on the other hand, for any probabilistic polynomial-time 
algorithm A the probability P{A{f{x)) = x) is negligible. A predicate (bit) 
B{.) is called hard-core for the permutation /(.) if can be determined 

in polynomial time whereas for all probabilistic polynomial time algorithms 
A, the difference P{A{x) = B{x)) — 1/2 is neglibile. If there exists a hard-core 
bit, then the permutation has to be one-way. Blum and Micali (1984) have 
proved that every one-way permutation /(.) with hard-core bit B(.) gives 
rise to a perfect pseudorandom generator as follows: 

Theorem 5.6. (Blum and Micali.) Assume /(.) is a one-way permuation 
with hard-core hit B{.). Then the iteration 

g{x) = {B{f{x)), B{f{f{x))),B{f{f{f{xm , . . .) 
yields a perfect pseudorandom generator. 

Proof: We will show that the above generator is not backwards predictable. 
Assume the contrary. Then one could guess, in polynomial time and with non- 
neglibile probability of success, the value B(/”(a;)) (the n-fold iteration of /) 
given the set of values S = {B{f^{x)) : m> n-\-l}. But S can be computed 
in polynomial time from /"(x). So one can guess in polynomial time and with 
probability of success non-negligibly greater than 1/2 the value B(/”(x)). So 
B{.) can not be hard-core, since /"(x) has the same distribution as /(x) by 
the fact that /(.) is supposed to be a permutation. □ 

In the Blum-Micali generator, x plays the role of a random seed. So these 
generators rather serve to improve randomness than produce it. 
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5.4 Local Statistics and de Bruijn Shift Registers 

A general feedback shift register (FSR for short) of length n is a feedback shift 
register that is defined like an LFSR with the exception that the feedback 
function / needs not be linear, but can be an arbitrary function / : fB" ^ IB 
(i.e., a polynomial in n Boolean variables of arbitrary degree). 

Assume that x = {xi}i>o is an m-periodic bitsequence. We say that x has an 
[almost] ideal local statistics of order h if [almost] every /i-tuple appears the 
same number of times as a subsequence of The following nested 

property of local statistics holds: If the m-periodic sequence x has [almost] 
ideal local statistics of order h, then it has also [almost] ideal local statistics 
of order 1,2, . . . , h — 1. 

Proposition 5.1. An FSR of length n can not produce an output sequence 
with [almost] ideal local statistics of order n -I- 1. 

Proof: Any pattern of n consecutive bits in x determines uniquely the next 
bit. Thus at most 2" of the possible 2"+^ patterns of n consecutive bits can 
occur in x, hence a fraction of at least 1 — 2"/2"+^ = 1/2 of the possible 
subsequences of length n never occur in x. So almost ideal local statistics of 
order n -I- 1 is not possible. □ 

Now we want to characterize those FSR of length n that produce ideal local 
statistics of order n. Here the notion of a so-called de Bruijn FSR turns out 
to be crucial. 

An FSR is called non-singular, if all its states lie on closed cycles in its 
state-transition diagram. Other states would be called transient states, so an 
FSR is non-singular iff it has no transient states. In other words, an FSR 
is non-singular iff every state has a unique predecessor state. A de Bruijn 
FSR is a non-singular FSR with only one cycle. Non-singularity can be be 
characterized algebraically by the Golomb- Welch Theorem: 

Theorem 5.7. /Golomb- Welch/ An FSR of length n is non-singular iff its 
feedback function f satisfies 

f ■ 5 ^j— 1 ) — ^j — n F gip^j — 2 5 ■ • ■ 5 — • (^44) 

Proof: We first prove that (5.14) is necessary for non-singularity. Relation 
(5.14) holds iff 



/(O, 02 , 03 , • • • , a«) = 1 + /(I, 0 . 2 , 03 , ... , o„) 

for all 02 , 03 , . . . , o„ G IB. If the FSR is singular, then there must exist at least 
one state (6i, . . . , &„) G IB" with two predecessor states in the state-transition 
diagram. But then 

/(O, 6i, &2, ■ • ■ , bn-i) = /(I, bi,h2 , . . . , bn-l), 



hence (5.14) can not be fulfilled. 
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Now let us show sufficiency. Predecessors of state (oi, 02 , . . . , a„) € IB"^ have 
the form ( 6 , oi, 02 , . . . , On-i) G iB" with 

a„ = /(&, oi, 02 , . . . , o„_i) = b + 5 ( 01 , 02 , . . . , o„_i). (5.15) 

Equation (5.15) has the unique solution 

b= an- 5 ( 01 , 02 , . . . ,o„_i), 

hence every state has a unique predecessor. □ 

Corollary 5.2. The period of the output sequence of an FSR of length n is 
at most 2" with equality iff the FSR is a de Bruijn FSR. (In this case, the 
output sequence wiil be called a de Bruijn sequence.) 

Now the following property holds: 

Theorem 5.8. The output sequence s of an FSR of length n is a periodic 
sequence with ideal local statistics of order n iff the FSR is a de Bruijn FSR. 

Proof: Since there are 2" different n-tuples and since the output sequence 
of an FSR has period m < 2", every n-tuple can appear in the subsequence 
if 771 = 2". On the other hand, if m = 2", then every n-tuple 
has to appear exactly once in Hence m = 2" is a necessary and 

sufficient condition for x to have ideal local statistics of order n. But m = 2" 
means that the FSR is a de Bruijn FSR. □ 
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Here, the model of the key generator is the following: There are m LFSR 
yielding the outputs {a;p^}i>o (1 < J < w). These LFSR streams are gathered 
by a non-linear combining function / : JB™ IB to yield the output 



= /(4 






On a ’’short time” basis, the LFSR outputs may be well modeled by n inde- 
pendent symmetric binary sources. Now take for example m = 3 and 



f{xi,X2, Xz) := XiX2 + XiXz + X 2 X 3 . 



One sees that if the LFSR are modeled as above, then P{zi = 0) = P{zi = 
1) = 1/2. But, if the LFSR 1 (i.e., that which generates the sequence 
{x|^^}i>o) is known, we can mount the following correlation attack to find 
the true phase of LFSR 1 and thus find its initial state: If we multiply {zi}i>o 
with the shifted sequence in the ’’correct” phase, then we see from 

the definition of / that the 1 occurs with probability 3/8 instead of just 1/4 
as it would be with true binary symmetric random sequences. This type of 
attack can be done for every LFSR. 

So a natural question is how to choose the combining function / to avoid 
such attacks. 
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Definition 5.2. A function f : ^ IB is called h-th order correlation im- 

mune if, whenever X\,X 2 , ■ ■ ■ ,Xm are independent unbiased IB-valued ran- 
dom variables, then Z := f{Xi,X 2 , ■ ■ ■ , Xm) is independent of all finite sub- 
sequences {Xi^ , Xi ^ , ■ • ■ , Xi ,^ ) (1 < ii < ■ . ■ < ih ^ Tn). 

The signification of h-th order correlation immunity lies in the fact that if a 
non-linear combination function / is h-th order correlation immune, then it 
is not possible to mount a correlation attack on any combination of h input 
sequences. 

For a function / : JB™ ^ IB, its Fourier (or Walsh-Hadamard) transform is 
defined as 

F{co) := ^ /(x)(-l)<-’“) (cc e B^). 

One has the following inversion formula: 

/(x):=2-™ ^ 

Now correlation immunity can be characterized in terms of Fourier transforms 
as follows: 

Theorem 5.9. (Xiao- Massey Spectral Testj The function f : ^ B 

is h-th order eorrelation immune iff its Fourier transform F satisfies 

F{u)i,U)2, . . . , Wm) = 0 

for all u) = (wi, W 2 , . . . , Wm) G B'^ with 1 < wh{oj) < h (where wh{<w) 
denotes the Hamming weight (i.e. the number of entries 1) of the vector u). 

The proof of Theorem 5.9 follows from the following two lemmas: 

Lemma 5.7. Let X be a random vector consisting ofm independent unbiased 
B-valued random variables Xi,X 2 , ■ ■ ■ , Xm, f '■ B"^ B, uj € iB’”\{0} and 
put Z := f{Xi,X 2 , ■ ■ ■ ,Xm)- Then Z is independent of {X,lo) iff f\lo) = 0. 

Proof: Since 

Pz\{x,uj){Mb) 

^ |{x e JB”" : f{x) = l,{x,uj) = b}\ 

|{x G fB™ : {x,Lo) = &}| 

= ^ fix), 

x^lB'^:{x,(jj)=b 

we get 

Pz\{X,Lj)iMQ) - -Pz|(X,;^)(l|l) 

xGIB^ 
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Lemma 5.8. A discrete random variable Z is independent of the random 
vector Y = (Yi, F 2 , • • • , Y^) G iB™ iff for every a G IB™, Z is independent of 
{Y,a). 

The proof follows directly from considering Fourier transforms (see Bryniels- 
son (1989)). □ 

We point out that Theorem 5.9 is really applicable in practice, since to com- 
pute the Fourier transform needs at most 0(m2™) additions and subtractions 
(see Massey (1997), p. 3.63). However, high-order correlation immunity can- 
not happen if the nonlinear order A of the function / is too high. Let us 
explain this in detail: Let / : IB™ ^ IB. Then the so-called algebraic normal 
form of the function / is 



f{xi,X 2 , . . . , Xm ) = oo -I- aixi + 02X2 -I- ... -I- ^mXm 
+ ai^ 2 XlX 2 + 01,3X1X3 -I- . . . 



+ Ol,2,...,ma;iX2 . . . Xm ; 



where the coefficients are given by the inversion formula 



^ ^ f {xi, X 2 , . . . , XjYi) 

a:6S'i,2 k 



(5.16) 



and 






{x . X}z-\-\ — Xfc+2 — • ■ • — X^ 



= 0 } 

{x} 



1 < fc < m — 1 
k = m, 



(5.17) 



etc. (see Siegenthaler (1984)). 

Definition 5.3. The nonlinear order X of a function f : ^ IB is the 

maximum number of variables xj that occur in a term of the algebraic normal 
form of f. 



Theorem 5.10. ('Siegenthaler’s Inequality^ If X denotes the nonlinear 
order of the function f : IB^ IB and if f is h-th order correlation immune, 
then 

h < m — X. 



Proof: Assume / is h-th order correlation immune for some h G {1,2,..., m— 
1}. We show that no product of m — Ii -I- 1 or more variables Xj can occur in 
the algebraic normal form of /. Define the numbers 

Ni, 2 ,...,k = |{x G IB™ : X G Bi, 2 ,...,fc, /(x) = 1}|. (5.18) 

Let Z := f{X) where A is a vector of m independent unbiased IB-valued 
random variables. Then we get 
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P{Z=1 I Xk+i=Xk +2 = ■■■ = x^ = 0) 

= {l<k<m-l) (5.19) 

and 

P{Z = 1)= ^^'^-^ . (5.20) 

We obtain 

P{Z=1 I Xk+i=Xk+2 = ■■■ = X^ = 0) 

= P{Z =1) {m — h < k < m — 1) 

and hence from (5.19) and (5.20) 

iVl.2 _ ^1,2... .,m— 1 - ^ 

2^m 2^m—l * ' * ^m—h ’ 

which implies 

iVp 2 .....fc = (m-/i<A:<m). (5.21) 

From (5.21), for m — h + 1 < k < m, these numbers must be even, which 
implies, from (5.16) and (5.17) 

01^2 k = 0 {m — h + 1 < k < m). 

However, this argument not only applies to the first k components of x, but 
to any k components of x, which proves the assertion. □ 

The tradeoff given by Siegenthaler’s Inequality does not exist if the combining 
function / is allowed to have memory. We will not persue this track further 
and only refer to Rueppel (1986), Chapter 9. 

Further seminal papers on correlation attacks are e.g. Chepyzhov, Smeets 
(1991) and Meier, Staffelbach (1989), (1991), (1992). 

5.6 The Quadratic Congruential Generator 

Now we will consider a special example of the above Blum-Micali generator, 
namely the quadratic congruential generator. Its implementation can be done 
by a (especially simple) non-linear shift register of length 1. Assume n is 
a Blum integer, i.e., a product of two distinct odd primes p and q both 
congruent to 3 (mod. 4). (In particular, the factoring of Blum integers is 
believed to be computationally hard.) Let k be the length of the binary 
expansion of n {k := |n|). For an integer x, define the ’’absolute value” mod.n 

by 

II J x{mod.n) : x{mod.n) < n/2 

\n — {x{mod.n)) : x(mod.n) > n/2. 
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Then take, as permutation /(.) = /„(.), the ’’absolute value” of the square: 

f„{x) := \x‘^{mod.n)\„. 

(By Euler’s criterion (mentioned in Section 2.2) for prime factors of Blum 
integers we have that 



{{n-y)\n) = {-y\n) 

= i-Mn){y\n) 

= (-ib)(-i|9)(y|«) 

= {y\n). 

Hence exactly one square root of a quadratic residue modulo n is less than 
n/2, since every quadratic residue modulo n has exactly two square roots 
with Legendre- Jacobi symbol equal 1. So / is really a permutation of the set 

S' = {x G : 0 < X < n/2, (x|n) = 1}.) 

Using this permutation in the Blum-Micali construction will be called the 
quadratic congruential generator. The main aim of this section will be to 
prove and discuss the following important fact: 

Theorem 5.11. Breaking the quadratic congruential generator is probabilis- 
tic polynomial-time equivalent to the factoring of n. 

First we will show the following: 

Theorem 5.12. Inverting fn{-) is probabilistic polynomial-time equivalent to 
factoring n = pq (p, q primes). 

Proof: On given as input a square z = y^{mod.n) with Q < z < n!2 and a 
square root y with (y/n) = — 1 is known, then a probabilistic polynomial-time 
algorithm A that inverts /«(.) will output a square root x of z in the domain 
of fn{-) with probability greater than, say, l/5{\n\) (where \n\ is the length 
of the binary expansion of n and <5 is some polynomial). Since (x/n) = 1 (by 
the definition of the domain of /„), but also (— 1/n) = 1, it is not possible 
that X = ±y{mod.n). But 

pq\x'^ - = (x - y){x -k y), 

which entails that exactly one of the two primes p, q divides x -\- y evenly. 
From the Euclidean algorithm one can compute gcd(n, x -\- y) in polynomial 
time, which yields the factors p and q. The probability that a randomly chosen 
y G satisfies indeed (y/n) = — 1 and also the probability that 0 < z < nl2 
are both 1/2. If j/ is a randomly (with uniform distribution) selected element 
of the set {y G Z* : {y/n) = —1}, then also z is uniformly distributed on the 
set of all quadratic residues mod.n. So one can run the following probabilis- 
tic algorithm: Generate at random (with uniform distribution) an element 
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y G Z*. Repeat this until for ^ := y'^{mod.n) one has 0 < 2 ; < n/2. Then 
input z to the inverting algorithm A. Check if Aiz)"^ = zimod.n). The mean 
number of times this has to be done until one finds a square root that allows 
to factor n is 26{\n\). If A is polynomial, the whole calculation proceeds is 
expected polynomial-time. □ 

Proof of Theorem 5.11: 1. The strategy of proof will be to show that 
Lsb(/”^(.)), the least significant bit of is hard-core and then to use 

Theorem 5.12. In other words, we will prove that under the hypothesis that 
a probabilistic polynomial-time algorithm that can guess the least significant 
bit of with probability non-negligibly greater that 1/2, then one can 

construct a probabilistic polynomial-time algorithm for inverting /«(.). So 
there is some similarity to Section 2.4. Let O denote an oracle that takes 
as input n and an x in the range of /«(.) and yields as output a guess for 
Lsb(/”^(.)) that is correct with probability 1/2-1- \n\~‘^ for some constant c. 
Now the method for constructing a probabilistic polynomial-time algorithm 
for inverting /«(.) will be to call the oracle O at most polynomially many 
times to find with the aid of a gcd-algorithm that makes all its com- 

putations based solely on the least significant bits of all involved integers 
(so that we can use O). This can be done with the Brent-Kung algorithm, 
which we will describe later and from which we will show that indeed with a 
probability lower-bounded by the inverse of some polynomial in k yields the 
correct answer, so that the experiment described in the following needs to be 
repeated only an expected number of times that is polynomial in 1. Using the 
Brent-Kung algorithm for calculating greatest common divisors, we compute, 
for randomly chosen a, b, the greatest common divisor gcd([ax]„, [bx]„) based 
on the permuted values fn{ax{mod.n)) and f„{bx{mod.n)), where 



J zijnod.n) : z{mod.n) < n/2 
( z{mod.n) — n : z{mod.n) > nj2. 



When we have finished the Brent-Kung algorithm, we will be in possession 
of a representation of [dx]n '■= gcd([ax]„, [bx]„) of the latter gcd, hence d and 
fn{dx{mod.n)) = fn{[dx]„) are known. If [ax]n and [bx]„ are relatively prime 
(an event whose probability tends asymptotically to G/tt^ as n ^ 00 due to 
a theorem of Dirichlet), then it follows that 



[dx]n = ±1 (5.22) 

and therefore fn{dx) = 1. If we check /„(x) = fn{±d~^{mod.n)) (note that 
the Euclidean algorithm calculates inverses in polynomial time without know- 
ing the prime factorization of n) and find that these two values are indeed 
equal, then we have a good probability that one of the values zLd~^{mod.n) 
lies indeed in the domain of /«(.) and we are finished. Otherwise, repeat the 
experiment sufficiently (probabilistically polynomially) many times. 
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2. Now we turn to the description of the Brent-Kung algorithm: 

Given two integers A (odd) and B with lengths < |n|, repeat the following 
steps until B = 0: 

- While Lsb(|i?|) = 0, do B ^ B/2; length(i?) ^ length(B) — 1. 

- If length(B) < length(A), then swap(A, B); swap(length(7l), length(i?)). 

- If Lsb(|(A + B)/2|) = 0, then B ^ {A + B)/2- else B ^ {A-B)/2. 

(If A is even, then in step 3 the expression Lsb((A + B)/2) makes no sense. 
But in this case we can evidently reduce A before, so that it will become 
odd.) 

(This Brent-Kung algorithm rests on the following facts: 

(i) If a, b are both even, then gcd(a, b) = gcd(a/2, b/2). 

(ii) If a is odd and b is even, then gcd(a, b) = gcd(a, b/2). 

(iii) If a, b are both odd, then gcd(a, b) = gcd(a, {a+b)/2) = gcd(a, (a—b)/2).) 
After halting of the Brent-Kung algorithm (i.e. B = 0), the variable (memory 
cell) A contains the gcd of the two original input numbers A and B. One 
counts that the maximal number of evalutations of a least significant bit in 
the Brent-Kung algorithm is 0{\n\). 

3. Let us now explain how the Brent-Kung algorithm is used in our problem. 
In our application, we must put A := [ax]n and B := [bx]n and the algorithm 
will work only with f„{ax{mod.n)) and f„{bx{mod.n)) with the aid of the 
oracle O. We first define the so-called parity by 

par(6x(mod.n)) := Lsb(|B|) = Lsb(|[6x]„|). (5.23) 

The bit Lsb(|B|) is what we really want to know at the end, so from (5.23) 
we must look for a procedure that calculates the parity, using the oracle O. 

4. The parity algorithm (sketch): The basic principle here is to determine 
pax{dx{mod.n)) by comparing the Lsb(s) with Lsb(s -I- dx{mod.n)) for ran- 
domly chosen s G ^\n/ 2 \ (with uniform distribution). If no ’’wraparound 0” 
occurs by adding s to dx, then one has the relation 

par„ (dx(mod.n)) = Lsb(s) -I- Lsb(s -I- dx{mod.n)) {mod.2). (5.24) 

The probability of a ’’wraparound 0” can be shown to be small. If one 
chooses s at random as described above, then unfortunately the values of 
(s -|- dx){mod.n) are not known. How to overcome this difficulty and for fur- 
ther details, in particular the control of possible errors, we refer to Brands, 
Gill (1996). □ 
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6.1 Entropy and Coding 

In this section, we will introduce one of the most important notions in cryp- 
tology, namely the information content and the entropy. The entropy of a 
random variable X that can assume the n different values Xi^X 2 , ■ ■ ■ ,Xn with 
the respective probabilities pi,p 2 , ■ ■ ■ ,Pn is defined as 

n n 

H{X) = H{pi,P2, . . . ,Pn) = ^Kl0g2(l/Pi) = -^p^\ 0 g 2 P^■ 

i=l i=l 

If one considers, e.g., a decision tree, then one sees easily that log 2 (l/pi) (unit: 
’’bit”) may be interpreted as the information content of the realization Xi of 
the random variable X, i.e. the entropy is the average information content 
of a realization of X. Actually, the entropy only depends on the probabilites 
pi,P2, ■ . ■ ,Pn and not of the realizations xi,X 2 , ■ ■ ■ ,Xn themselves. If we define 

A := {xi,X 2 , . . .,x„} 

as the alphabet, then, instead of H{X), we will sometimes also write H{X). 
In coding theory, the entropy is (loosely speaking) the optimal average length 
of a codeword in an alphabet with two letters, as we will see in the following. 

Lemma 6.1. If for pi, qi > 0 we have the inequality Ym=i 9* — '^7=1 Pi’ then 
it follows that 

n n 

-^P*l0g2Pi < -^Kl0g2(J*. 

i=l i=l 

Proof: Since log a; < x — 1 we have log 2 x < hence 

n 1 ^ 

J2p^'^Og2{q^/p^) < - 1) 

- n n 

® i=l i^l 

< O.D 

D. Neuenschwander: Prob. and Stat. Methods in Cryptology, LNCS 3028, pp. 77-88, 2004. 

© Springer- Verlag Berlin Heidelberg 2004 
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If in the above lemma we put qt = 1/n {i = 1,2, ... ,n), then the inequality 

H{pi,p2, . . . ,Pn) < H{~, log 2 n 

n n n 

follows. Hence the entropy of a random variable with n possible values be- 
comes maximal if these n values are uniformly distributed, and in this case 
it assumes the value log 2 n. 

Given an alphabet X = {xi,X 2 , ■ . ■ ,x„} with the n different letters xi,X 2 , 
. . . ,Xn (so, e.g., for the latin alphabet we have n = 26 und x\ = ”A”,X 2 = 
”H”, . . . ,X 26 = " Z''), then a (binary) encoding is a map 

C-. Xi^ Ci, 

which assigns a (finite) bitsequence Cj to every letter Xi from the alphabet 
df such that the condition of decodability (or ’’unique decoding”) is fulfilled: 
Two different sequences of letters of plaintext must yield different codes. A 
sharper condition is irreducibility (Fano condition): No codeword is allowed 
to be the beginning of another codeword. Let ^i denote the length of the 
codeword for the letter Xi. If pi is the probability of occurrence of letter Xi, 
then 

n 

^ ^ pdi 

i=l 

is the average length of a codeword. 

Lemma 6.2. (Kraft’s inequality) For a decodable code we have 

n 

i=l 

Proof: The proof uses generating functions of sequences. Let Ok be the num- 
ber of letters from the alphabet X with a codeword of length k and bm the 
number of sequences of letters (from X) with a codeword of length m. Then 
we have 

m 

^ ^ ^k^m—k 
k^l 

(where bo := 1). If we consider the generating functions of and 

oo 

g{z) = ^ b^z^ (|z| < 1/2) 

m—1 

(convergent since bm < 2 ’” due to the ’’unique decoding condition”) and 

OO 

= (^ e iR) 
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(polynomial!), then we obtain 

OO 

g{z) = hmZ'^ 

m—1 

OO OO 

= ^ ^ ^ ^ CLk^m—k^ 
k—1 m—1 

OO OO 

= '^akz'^'^biz" 
k—1 i—0 

= f{z){g{z) + 1) 



(where we have put i = m — k). Hence 



f{z) 



g{z) 

g{z) + i' 



Since g{z) > 0 (0 < z < 1/2), we have that the polynomial f{z) < 1 (0 < 
z < 1/2), hence /(1/2) < 1 due to continuity, which yields the assertion. □ 
On the other hand, it holds that 



Lemma 6.3. For given £i,£ 2 , ■ ■ ■ ,in G IN that obey Kraft’s inequality, there 
exists an irreducible code such that the codeword Ci is of length ti . 



Proof: The proof is of combinatorial nature. Let Ik be the set of all i € 
{1, 2, . . . , n} such that £i = k and denote by Ofc the number of elements of Ik- 
Thus the letters Xi for which i G Ii can be encoded by codewords of length 
1. Now we proceed by recursion. Assume all letters having a codeword of 
maximal length fc — 1 are encoded. Then one has used, for j < k, always Oj 
bitsequences of length j that are initial strings of always bitsequences 
of length k. Thus under the condition of irreducibility. 



fc-i 

-E- 



-)k-j 



k-1 



2 -^-) 



bitsequences still remain at our disposal. Since Kraft’s inequality must hold, 
the latter quantity is > 2*(afc2“^) = Ok, so that enough codewords remain 
for also encoding all letters Xi with i G Ik- LI 

Let us assume that the letters Xi of the alphabet X occur with the respective 
probabilities pi and put H{X) = H{pi,p 2 , - - - ,Pn)- 

Theorem 6.1. (Coding Theorem) (i) Every decodable encoding of the al- 
phabet X has an average codeworth length 

I> H{X). 



(ii) On the other hand there exists an irreducible code with average codeword 
length 




80 



6 An Information Theory Primer 



Proof: (i) From Kraft’s inequality and Lemma 6.1 (by putting Qi = 2 it 
follows that 

n 

= - '^P^^Og2P^ 

n 

i^l 

n 

~ 'y ' Pi^i 

i=l 
= 1 . 

(ii) Let £i = — [log 2 Pi\ ■ Then we have 

2"^* < 2 -i° 82 Pi =p.^ 

which entails Kraft’s inequality. Now Lemma 6.3 yields the existence of an 
irreducible code with these £i. On the other hand, one sees that 

n 

^ ^ pdi 

i^l 

n 

^ + 1 ) 

i=l 

= H{X) + !.□ 

To find such an optimal code explicitly is another problem of coding theory, 
which will not be considered here. It has to do with so-called Huffman trees, 
which are rooted binary trees all of whose non-leaves are arranged, from left 
to right, in order of non-decreasing distance from the root. 

Another (less known) complexity measure is the so-called marginal guess- 
work. It has nothing to do with the entropy in the sense that there are no 
general inequalities relating these two measures onto another. This approach 
will be presented in Section 6.3. 



6.2 Relative Entropy, Mutual Information, and 
Impersonation Attack 

In this section, we will collect some results about information theory involving 
several probability measures. First, we define the ’’relative entropy” between 
two probability measures: 

Definition 6.1. Let P and Q be two probability measures on the same alpha- 
bet X. Then the relative entropy (information divergence, Kullback-Leibler 
distance, discrimination) from P to Q is defined as 
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D{P\\Q) := 



xGS 



Q{x) 

P{x)' 



Note that in general D{P\\Q) ^ P{Q\\P). The following lemma shows that 
the word ’’distance” is chosen reasonably. 

Lemma 6.4. D{P\\Q) > 0 with equality iff P = Q. 

Proof: From the inequality log r < r — 1 with equality iff r = 1 we deduce 



D{P\\Q) > 



1 

log 2 









Q{x) 

P{x) 



1 ) 



° x^X x^X 

> 0 , 



with equality iff P{x) = Q{x) for all x & X. U 

The relative entropy has important properties for hypothesis testing. This 
will be used, e.g., in connection with the impersonation attack presented at 
the end of this section. Consider the null hypothesis iJo and the alternative 
i?i and let V be the decision: V = 0 if we decide for Hg and C = 1 if we 
decide for Hi. Let Vg be the decision region for Hg and T>i that for Hi. 
Consider the probability measures 



p Py\Ho^ 

Q '■= Py\Hi ■ 

One can interpret the relative entropy from P to Q as the expectation of the 
log-likelihood ratio under the null hypothesis: 

P(P||g)=P(l0g2® I Hg) 

(since 



D{P\\Q) = D{Py\H,WPY\H,) 



E 



p 



Y\Ho 



{y) log; 



PY\H,{y) 

PY\Hoiy) 



)• 



yesupp(P-riHg) 

Let a, resp. (3, denote the drror probabilites of first, resp. second, kind: 



a : 



:=P(y GPi|Po)= E 

yeVi 



f3 := P(y e Vg\Hi) = Y, Qiy)- 

yeT>o 
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Then we also have 

D{Pv\Ho\\Pv\h^) = -alog2 - (1 -a)log2 
Theorem 6.2. It holds that 



D{Py\HoWPv\Hj < D{P\\Q), 

with equality iff for the likelihood ratio L{y) := P{y)/Q{y) we have L{y) = £q 
for all y € T>o and L{y) = li for all y €Vi. 

Proof: Consider the events Ai = {V = i} {i = 0, 1). We observe 



D{P\\Q) = E{- log, ^\Ho) 

= E{- log, ® |i?o n Ao)P{Ao\Ho) 

+E{-log,^^\HonA^)P{A^\Ho). (6.1) 



However, PyiHonAoiv) = P{y)/{^ - a) if y G X>o and 0 if j/ G X>i. So, by the 
concavity of the log-function, it follows that 



E{- log 2 I© 1^0 n Ho) > - log,{E{^\Ho n Ho)) 



P{Y)' 



= - logs 



= - log; 



y&'C’o 

/3 

' 1 — a 



P{Y)' 

Q{y) P{y) 

P{y) 1 - a 



Also P{Ao\Ho) = 1 — a. By applying similar considerations for the second 
summand in (6.1) we obtain the asserted inequality. □ 

An important special case of the above theorem is the following estimate: If 
we choose the error probability of the first kind to be 0 (this is the usual 
assumption in cryptography; we do not want that an honest cryptogram of 
Alice will be thought of as fraudulent by Bob), then it follows that the error 
probability of the second kind has the following lower bound: 

Corollary 6.1. If a = 0, then 

p > 2-d{p\\q)_ 



Now we turn to the definition of mutual information. For this, we first define 
the conditional entropy of the random variable X given Y : 

H{X\Y) := E{-log,PxiY{X\Y)) 

= - P{x,Y){x,y)log,Px\Y{x\y). 

(a:,y)esupp(P(x,r)) 
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Let us collect some properties of the conditional entropy in the form of lem- 
mas. They can be proved based on the relative entropy. 

Lemma 6.5. It holds that 



0 < H{X\Y) < H{X) 

with equality on the left-hand side iff Y uniquely determines X and with 
equality on the right-hand side iff X and Y are independent. 

Proof: Let P{x,y) := P(^x,Y){x,y) and Q(x,y) = Px{x)Py{v). Then we 
have 



Q<D{P\\Q) 



X! P{x.Y){.x,y)\og2 

(a;,y)esupp(P(x,v)) 

X! P(x.Y){x,y)\og2 

(a;,y)esupp(P(x,v)) 



Px{x)PY{y) 

P(x,Y){x,y) 

Px{X)PY{y) 

P(x\Y){x\y)PY{y) 



= E{-\og2Px{x) -\-log2Px\Y{X\Y)) 
= H{X) - H{X\Y) 



with equality iff P(x,Y){x,y) = Px{x)PY{y) for all (x,y) G supp(P(x,v)), i.e. 
iff X and Y are independent. □ 

Lemma 6.6. 



H{{X,Y)) = H{X) + H{Y\X). 

Proof: 



H{{X,Y)) = E{-log 2 P^x,Y){X,Y)) 

= E{-log2{Px{X)PY\x{Y\X))) 

= E{- log2 Px{X)) + E{- log2 Py\x{Y\X)) 
= H{X)-I H{Y\X).n 

Lemma 6.7. 

H{X\{Y,Z))<H{X\Y) 

with equality iff X and Z are independent, given that Y is known. 

Proof: Put 

P{x, y,z):= P{x,Y,z) (x, y, z) 

and 

Q{x,y,z) := PY{y)Px\Y{x\y)Pz\Y{z\y). 
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Then for the relative entropy we have 



D{P\\Q) = - 



E 



P(x,Y,z) {x,y,z)log 2 



(x,y,z)eBupp(Pi^X,Y,Z)) 

Py{Y)Pxiy{X\Y)Pz^y{Z\Y)^ 



PY{y)Px\Y{x\y)Pz\Y{z\y) 
P(X,Y,Z) (x,y,z) 



= E{- log 



P(x,y,z){X,Y,Z) 



But, since 



P(x,Y,z) = Py{Y)Px\y{X\Y)Pz\(x,y){Z\{X,Y)), 
it follows that 



D{P\\Q) 



E{- log 2 



Pz\y{Z\Y) 



Pz\ixHZ\{X,Y)) 

H{Z\Y)-H{Z\{X,Y)) 



) 



> 0 , 



with equality iff for all x,y, z 

PY{y)P{x,z)\Yi{x,z)\y) = PY(y)Px\Yix\y)Pz\Yiz\y), 
which yields the assertion. □ 

This allows us to define the mutual information I{X; Y) := P[ {X) — H {X\Y)\ 
this is the information that Y gives about X. So Lemma 6.5 can be rewritten 
in the form 

0 < I{X-,Y) < H{X), 

with equality on the left-hand side iff X and Y are independent and equality 
on the right-hand side iff Y uniquely determines X . More generally, one can 
define 

I{X- VIZ) := ff(XlZ) - H(XI(V, Z)) 
as the information that V gives about X if Z is known. Then one also has 

0 < I(X;rjZ) < H(XjZ), 

with equality on the left-hand side iff X and V are independent given Z and 
equality on the right-hand side iff V uniquely determines if Z is given. 

In cryptology, there is not only the problem of keeping a message secret, but 
that of Bob being able to be ’’reasonably” sure that he really gets what Alice 
has sent to him without Eve having changed the text. We underline that se- 
crecy and authenticity/integrity are different properties, neither implies the 
other automatically. 

One speaks of an impersonation attack in the case when Alice sends a cryp- 
togram Y to Bob, then Eve, without observing V, replaces it by some fraud- 
ulent cryptogram V. The impersonation attack succeeds if Bob can decrypt 
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Y' := Y and accepts it (i.e. believes that it comes from Alice). We denote by 
Pi the success probability of an impersonation attack if Alice uses an opti- 
mal strategy. There is a lower bound due to Simmons (1984) for this success 
probability: 

Theorem 6.3. (Simmons’bound) Denote by Z the key used. Then we have 

Proof: It is useful to interpret the impersonation attack as a statistical 
hypothesis-testing problem as follows: Denote by Hq the null hypothesis that 
the cryptogram Y' received by Bob is really the cryptogram Y written by 
Alice, who used the key Z = z, hence P{y) = PY\z{y\z) for all y. As al- 
ternative, we consider the hypothesis i?i that Y' has been formed by Eve 
according to the probability 

Q{y) = Priy) = '^PY\z{y\z)Pz{z)- 

Z 



Then we have 






Z = z) 



and thus 



A(D(P||Q)) = ^D(P||g)Pz(0) 

z 



= E{- log 2 



Py{y) . 

Py\z{y\zY 



= H{Y) - H{Y\Z) 

= I{Y; Z). 

But now from (6.2) and Corollary 6.1 we deduce that 



Pi > E{(3) 

> p(2-^(^ii'3)) 

> 2-E{DiP\\Q)) 

= 2-Py^ln 



( 6 . 2 ) 



The significance of Simmons’ bound is the following: Of course, in designing 
a cryptosystem where authenticity is important, one should take care that 
the success probability of an impersonation attack and thus Simmon’s bound 
is as small as possible, i.e., the cryptogram should reveal a large amount of 
information about the key. 
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6.3 ^Marginal Guesswork 

As we have stated at the end of Section 6.1, there are also other complexity 
measures than entropy. Here we will present the so-called marginal guesswork, 
which denotes, roughly speaking, the optimal number of trials necessary to 
be guaranteed a certain chance a of guessing a random value in a brute-force 
search. It will turn out that entropy and marginal guesswork have nothing 
to do with each other in the sense that there is no general inequality relating 
them one to the other. Let A be a random variable taking values in the 
alphabet A = {xi, X 2 , . . .}. While entropy measures how difficult it is to 
determine the value of X given single queries to multiple oracles that answer 
questions of the type ”Is A(w) G UT' for subsets U d X , marginal guesswork 
measures the difficulty of determining X (w) with multiple queries submitted 
to a single oracle that answers questions ”Is A(a;) = x?”. Let us go to a 
formal definition. 

Assume w.l.o.g. that the probabilities pi := P{X = Xi) are sorted in non- 
increasing order: 



Pi>P2>--->Pn> Pn+I = . . . = 0. 

Then, for 0 < a < 1, the a-marginal guesswork is defined as 

i 

Wa{X) := min{i Pj > a}. 

t=i 

Hence, Wa measures the maximum work for determining the value of the ran- 
dom variable X when one wishes a probability of success of a in a brute-force 
search. The case a = 1 is an exhaustive search. While in practice, the search 
for cipher keys is often exhaustive, the guess of passwords is rarely so (e.g., 
with UNIX). 

If the random variable X is uniformly distributed on some subset of X (e.g. 
deterministic), then one sees at once that H{X) « log 2 i«a(A). A similar 
relation holds for long random sequences with the ” asymptotic Equipartition 
Property” (Pliam (2000), p.73). However, the two uncertainty measures ’’en- 
tropy” and ’’marginal guess work” can be completely different in the following 
sense: 

Theorem 6.4. For each 0 < a < 1 and every positive number N , there are 
finitely supported random variables X and Y such that 

log^w^(X)> H{X) + N (6.3) 



and 



H{Y)>\og^w^{Y) + N. 



(6.4) 
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For the proof of Theorem 6.4 we need the following lemma: 

Lemma 6.8. For every £ > 0 there exists a finitely supported random vari- 
able X such that 

X! Pi < ^> 

i=l 

where pi := P{X = Xi) (and, w.l.o.g., pi > P 2 > ■ ■ ■)■ 

Proof: Define the random variable Xj^k by the sequence of probabilities 
. . . , a~^ followed by m copies of a“*, 



where a = 2-1 and m is chosen so that all above probabilites sum up to 1. 
One observes that we must have 

l + (a-2)a'= 



m = 



a — 1 



Calculating the entropy gives 

k 



H{X,,k) = Y. a* + log2 












a — 1 

l + (a-2)a'= 
(a — l)a^ 



- (/c+ l)a + fc l + (a-2)a* 



^ (a — l)^a^ 
a — 2 

= jk + hjk 

a — 1 



jk- 



{a — l)a* 



with 



hrk — 



•Xk - 



j-(a" - 1) 



(a-l)2- 



Now we fix a lower bound 2 < j, hence we have a > 4. Then we get 



and hence 



jk- — j > log^k 
a — I 






■= Pi 

k 

< ^ a“* + {2^^xk] 
i=l 



23 - 1 






k)a~’^ 



We obtain further 
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where 




- fc) - 1 

a^{a — 1 ) 



If we can show that for fixed j it holds that 



o-j,k ^ 0 (fc^oo), (6.5) 

then fixing a j > 2 such that (2^ — 1)“^ < e one can find a k such that 
Sj,k < £) which finishes the proof. So it remains to prove relation (6.5). Since 
hj,k j/c( {k oo), we may find an index fc(j) such that hj^k < 1 for all 
k > k{j). Hence for k > k{j) we have 

\Hj^k^ < — j" + 2 

and thus 

(a - l)(4/3'= - fc) - 1 

— T7 ; 

a'^{a — 1) 

with 

/3 ■= 2Jfo-2)/(“-i) < a. 

Two applications of de I’Hospital’s rule yields (6.5), as desired. □ 

Proof of Theorem 6.4: Let us first find X. We want to apply Lemma 6.8 
with £ := a2~^ . We then have 

Pi<‘2^ Pi< 2 ^£ = a. 

1=1 

H6I1C6 

which proves (6.3). 

In order to prove (6.4), Y will be defined as follows: Define the probabilities 
<71 := P{Y = j/i) := a and 

q, := P{Y = y,) := (1 - a)2-'= (2 < i < 2'= + 1). 

(This corresponds to a Huffman tree with one leaf of depth 1 and 2* leaves 
of depth k.) One observes that Wa{Y) = 1, while 

1 — a 

H{Y) = -alog^a- (1 - a) loga = (1 - a)k + K{a). 

The choice 

, N - K(a) 

k > — ^ 

1 — a 

indeed yields (6.4). □ 




7 Tests for (Pseudo-) Random Number 
Generators 



In this chapter, we will present some statistical tests for (pseudo-)random 
number generators. As mentioned earlier, there is no ’’universal” test for 
randomness, only finitely many necessary conditions can be tested. We will 
orient us particularly on the list of tests that has been applied to evaluate 
the AES (Advanced Encryption Standard; as is known, the winner has been 
the RIJNDAEL algorithm ,see, e.g.. Banks et al. (2000) and all the other 
literature on the AES, much of it available on the Internet) and the test 
battery suggested by Rukhin (2000b). For complete proofs, we refer to the 
latter paper and the literature cited therein. 



7.1 The Frequency Test and Generalized Serial Test 

Consider the piece 

X . — (x_(/_|_i, ; Xjv_l) 

of a bitsequence {x„}„g^. From this piece, one can form N overlapping In- 
grams of consecutive bits. Let denote the number of pairs of repeatedly 
occurring i/-grams. For a fixed v-gram 

S . — Z/+1; ^ — 1^ + 2; • ■ • 5 -^o) 

it is convenient to denote the events 

As(x) := {(x_i,+i, x_i,+ 2 , . . . , xo) = s} 

and, more generally, 

D Ag(x) . — {(X—jy+m+l 5 ^—L'+m+2) ■ • ■ 5 ^m) — 

Then one can write the test statistic as 

N-l 

= 2 E E ■ 1{D-Mx)) 

m,n=0;m^n s 

D. Neuenschwander: Prob. and Stat. Methods in Cryptology, LNCS 3028, pp. 89-105, 2004. 

© Springer- Verlag Berlin Heidelberg 2004 
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(where 1(. . .) denotes the indicator function). For v = 1 'we obtain just the 
usual frequency test. If one defines ns{x) as the frequency of occurrence of 
the i/-gram s in the piece x, then one can write 

M^{x) = ^ ^ns{x){ns{x) - 1). 

S 

Let us determine the (asymptotic) distribution of M^{x) under the null hy- 
pothesis that X consists of i.i.d. unbiased random bits. Since 

E{l{D"^Mx))) = 2 -^ 

we obtain E(ns) = 2~'^N. Let us first assume n> m and n — m<v (i.e., the 
’’windows” are overlapping) . One observes that l{D'^As{x))-l{D'^As{x)) = 1 
iff the first n — m bits are repeated, i.e. if cc is of the form 

So)- 

There are exactly 2"“’” i^-grams with this property, hence 
E{l{D”^As{x)) ■ l{D^As{x))) = 2-0+"-™) 



and thus 

E(Y, HD"^Mx)) • l{D^A,{x))) = 2-r 

s 

The same formula holds in the case of non-overlapping windows. Thus 



E{M:,) = A^(Af- l)2-0+i). 



(7.1) 



It is convenient to introduce the statistic 

2"+^ 

If we compare it with the ” goodness-of-fit” statistic 

(ns - 2-^Nf 



= E 



2-''N 



= ^ E - 1) -b 2“^ - A^, 



then we get 

From (7.1) we obtain 



= El -2'' + N. 



(7.2) 



2"+i 



E{M^) = N-1. 



E{L,) 



N 
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If by definition we put '?'q := 0, then (7.2) hold for all ly > 0. By a theorem 
of Good (1953), (1957) about the asymptotic (as N oo) y^-distribution of 
certain statistics related to and some algebraic manipulations (involving 
first and second difference operators, applied to L^) one obtains the asymp- 
totic variance 

Var(L^,) = 6(2'^ - 1) - Au 
and asymptotic covariance (for v\ < 1 ^ 2 ) 

Cov(Lj/j , -I- 3) — 2{i'i -|- + 3). 

Thus, for 1 ^ 1 , ^2 ^ 00 , i>i < V 2 , the correlation coefficient has the asymptotic 
behavior 

7.2 Maximum Absolute Value of Raudom Walk Test 

If the sequence to be tested is denoted by {xn}n>i, then let Sk be the fc-th 
partial sum: 

k 

Sk ■= Xj. 

Take the null hypothesis of i.i.d. unbiased random bits as before. From Revesz 
(1990), p.l7 we have, for the maximal partial sum, the relation 

P{ max jS'fcl > t) = 1 — P{{Ak — l)t < Sn < {Ak + l)t) 

Kk<n 

4|fc|<(n/t) + l 

+ ^ ^ P[{4:k + l)i < Srt < [4:k + 3)i), 

— (n/f) — 3<4fc<(n/t) — 1 

which can be used by noting the fact that under the null hypothesis, the 
statistic {Sn + n) /2 obeys a binomial distribution with parameters n and p = 
1/2. However, also (even for small values of n) the following approximation 
is valid: 



P{ max Sk < \/nz) 

Kk<n 






7T ^ ' 2j + 1 

j=0 ^ 



8z2 



4 9^2 

1 ^exp(-^) 

3V^z ^ 2 ^ 



(see Rukhin (2000b)). 
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7.3 Number of Visits of Random Walk Test 

An excursion of the partial sum process mentioned in the previous section is 
a sequence of indexes 

{i,i + 1 , : Si-i = Se+i = 0, Sk ^ 0 {k = i,i + 1 , . . . 

Let J be the number of excursions. The null hypothesis of i.i.d. unbiased 
random bits should be rejected if the value 

J{obs)/y/n 

P{J < J{obs)) « y— J ^'^du 

0 

is too small (where J{obs) means the observed value of J). If this is not the 
case, then the statistic ^{x), defined as the number of visits to 0) in one 
excursion should be calculated. Its distribution under the null hypothesis is 
known to be as follows: 

Proposition 7.1. 



P(«,) = 0) = 1 - ^ 

and 

Proof: We first assume that the individual random bits Sj are i.i.d. biased 
random bits, i.e., P{Sj = 1) = p, P{Sj = 0) = q < p. In a finite excursion 
it holds that ^{x) = fc > 1 iff the random walk {S'fc}fc>o attains the level x, 
then visits x exactly k — 1 times before it finally returns to zero. Hence the 
shifted random walk {Sk — x}fc>o never attains the value —x during its first 
fc — 1 excursions. Denote p := minjfc > £ Sk = 0}. Since the individual 
excursions are independent, one obtains 

P(C(x) = k,p<^)= P{ax) > 0)(P(C(-x) = 0,p< oo))'=-ip(e(-x) > 0). 

(7.3) 

Assume first that x > 0. Let tt be the probability that {S'fc}fc>o visits x — 1 
before —1. We get 

P(C(X) > 0) = P(5i = 1)7T. 

The probability tt can be calculated by Pascal’s ruin problem (see, e.g.. Feller 
(1968), Section XIV. 2, (2.5)), with which we obtain 

q-p 

{q/pY - 1 ' 



P(C(x) > 0) 




7.4 Run Tests 



93 



Similarly, 



hence together 



P(C(-x) > 0) 



p-q 

{p/qY - 1 ’ 



P(e(x) > 0) 



1 - {q/pY 



for all X] letting p ^ 1/2 proves the first assertion of the proposition. On the 
other hand, if we put I := l{x > 0}, then the above gives 



PYY) = o,p < oo) = 1 - \ - (1 - i){p-q), (7-4) 



since 

PYY) = 0, p = oo) = (1 - I)P{p = oo) = {1 - I){p - q). 
Substituting this in (7.3) yields 



P{^{x) = k, p < oo) 



p-q 

\{p/qY - 1 



-I{p-q)) 



k-1 



(7.5) 



(7.6) 



If one has an infinite excursion, then ^(x) = fc iff {S'fc}fc>o visits x, then 
returns to x exactly fc — 1 times (but does not return to 0) without visiting 
X and 0 afterwards. This is possible only if x > 0, since otherwise {5'fc}fc>o 
must attain 0 again. So if I = 0, then P{^{x) = k,p = oo) = 0. If we replace 
X by —X in (7.4), we obtain 



P(^(x) = k,p = oo) 



P(e(x) > 0)P(C(-x) = 



I{p-qf 

l-(f)" 



( 1 - 



p-q 

(EY - 1 



0,p<ooY ^P{p = oo) 
-{p-q)Y~\ 



Adding this to (7.6), and letting p 1/2, then the second assertion of the 
proposition follows. □ 

Now one can test the observed values ^{x){obs) against the theoretical ones 
by a chi-square test. 



7.4 Run Tests 

There are different definitions of runs in bitsequences. In the sequel, we will 
use the definition due to Feller: A bitsequence of length n contains as many 0- 
runs of length m as there are non-overlapping uninterrupted blocks containing 
exactly m zeroes each. The 1-runs are defined similarly. Let 

p := - 2 
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and 

^2 _ 22 (™+i) - (2m + 1)2’”+! - 2 

and define W (m, n) to be the number of runs of length m. By the limit 
theorem (see Rukhin (2000b), (5)) 

\/na 

{^{z) denoting the standard normal distribution function), it holds that for 
z{obs) = y/Ji{pi{W{m,n){obs)) — n) /{a^/n) 
the asymptotic p-value (as n oo) is given by 

p = 2(1 - <P{\z{obs)\)). 



Now we consider the case where also m —>■ oo, like 



In this case, we have that W{m,n) tends weakly to a Poisson distribution 
with parameter A (see Rukhin (2000b), (7), Barbour et al. (1992), Section 
8.4). On the other hand, if one denotes by W (m, ri) the number of overlapping 
runs of length m, then W{m,n) tends weakly to the so-called Polya- Aeppli 
distribution with Laplace transform (moment-generating function) 



A(e‘^) = exp( 



A(e* - 1) 

1 - eV2 



) 



(see Rukhin (2000b), (8)). This turns out to be a compound Poisson distri- 
bution, i.e. it corresponds to a random variable U with law 



= = („> 1 ) 



(see Rukhin (2000b), 5.1). This latter expression can also be written in terms 
of the confluent hyper geometric function iFi: 

P{U = u) = ^iF^{u+ 1,2, X/2) (u>l). 

To use this result for a test, one partitions the observed bitsequence into 
N substrings and the empirical frequencies within each such substring are 
conjoined by the x^-statistic. 

To test randomness, the Longest Run Test is also appropriate. Let denote 
the length of the longest run in a sequence of length n = MN {N blocks 
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of size M). If in the block (of size M) one has m ones and if we put u := 
min{M — m+ 1, [m/{k + 1)J }, then 

r^O ^ ^ 



P{v < k I r) = 



B-i)' 



i=o 



M -r + l\ fM - j{k + 1) 



3 



M — r 



(see Rukhin (2000b), p.ll7, Barton, David (1962)). 



7.5 Tests on Frequencies of Patterns 



Let 



S — (-Sl, 52, ■ . ■ , 5m) 



be a nonperiodic pattern (template) of length m = log 2 (n/A). Let W{m,n) 
be the number of occurrences of s in a bitstring of length n. Take the usual 
null hypothesis that the bistring consists of i.i.d. unbiased random bits. By in- 
terpreting W (m, n) suitably as a sum of indicator functions that the observed 
substring of length m coincides with s, one obtains 



E{W) = (n-m-k 1)2-™. 



If 2“™n ^ A, then W{m,n) has asymptotically a Poisson distribution with 
parameter A (Rukhin (2000b), 5.1, Barbour et al. (1992), Section 8.4). If only 
n —>■ oo and m remains fixed, then the law of W (m, n) (under suitable nor- 
malization) tends to a standard normal distribution (see also Rukhin (2000b), 
5.1). 



7.6 Tests Based on Missing Words 

We first make a preparation on correlation polynomials. Denote, as usually, 
by {xn}n>i a sequence of i.i.d. unbiased random bits. Let s,t G IB™ be 
aperiodic templates (patterns) of length m. Then the cprrelation polynomial 
is defined as 



k=l 

(where <5,^, denotes the Kronecker symbol: := 1 if s = t and zero else). 

The autocprrelation polynomial is defined as 

As{z) := Cs,s{z). 
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By the aperiodicity of s, we have Ag{z) = z'^ The autocprrelation matrix 
is defined as the (2 x 2)-matrix 



^(z) 



( Ag{z) Cg,t{z)\ 
VCm(z) At{z) )■ 



Let 7Ts(n) be the probability that the template s is missing in {xi, X 2 , . . . , x„}. 
Then by a theorem due to Guibas and Odlyzko (1981) we have that 



Fs{z) := 



E 



7T^(n) 

Z" 



zAg{z) 

{z-l)Ag{z) + 2-^ 



Let us write the above expression in the form 



E 



7Tg(n) 

z" 



zAs{z) 
P{z) ’ 



(7.7) 

(7.8) 



(7.9) 



where 

m 

P{z) = \\{z- Zj) 

1=1 

is a polynomial of degree m with leading coefficient 1. On the other hand, we 
observe that 






. ^-{n+m-fc) 



E n 



(7.10) 



n=0 k-i_+...+km=n 3 = 1 

By comparing coefficients in (7.7) and (7.10) we obtain finally 



m 

^s{n)= Y. n#'- 

fcl+... + fcm=™ 1 = 1 



Let n,m oo such that 

n2-™ ^ a > 0. 

One can show that in this case the asymptotic behavior of TTs{n) is of the 
form 

TTs{n) ~ e-“(l - (2m - l)a2-(™+i) + (m - 1)2-™) (7.11) 

(see Rukhin (2000b), p.l20). Let X be the number of missing templates 
of length m in {xi,X 2 , ■ ■ ■ ,x„}. Then one obtains the following asymptotic 
behavior of the expectation of X : 
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E{X) = e-“2™ + e-“(m - 1 - a/2) + 0(1) 

(Rukhin (2000b), (18)). By a similar, but of course more cumbersome proce- 
dure, one can also derive the variance of X\ 

Var(X) = ^7Ts(n)(l - 7Ts(n)) -I- ^(7rs_t(n) - 7rs(n)7rt(n)), 

S S^t 

where 'Ka,t{n) denotes the probability that templates s and t are missing in 
{xi,X 2 , ■ ■ ■ ,Xn} (Rukhin (2000b), (19)). These probabilities can be deter- 
mined from the equation (by comparison of coefficients) 

E TTs,t{n) 

Zn 

n = l 

= 2 • det(A( 2))((2 — 1) det(A( 2 )) -I- 2~'^{As{z) + At{z) — Cs,t{z) — Ct,s{z)))~^. 

The asymptotic behavior of the probabilities 7Ts,t(n) is of the following type: 
If Cs,t{z) = Ct,s{z) = 0, then 

7r,,t(n) = e-2“(l + (m - 1 - (2m - l)a)2-(™-i)) -h 0 { 2 ~^'^). 

If Cs,t{z) is of degree m — 1 — u and Ct,s{z) = 0, then 

TTs^tin) = e-2“(l -h o2-“) -h 0(2-™{™.2«}), 

It turns out that the main contribution is given by these two types of pairs 

7.7 Approximate Entropy Test 

A more general notion than entropy is the so-called 0-entropy. Assume 0 : 
[0, 1] ^ iR is a convex C^-function with 0(1) = 0. Then the 0-entropy of a 
random variable X having discrete distribution /i with atoms pi,p 2 , , pm > 
0 (at some places) is defined as 



M 

'^pAipj)- 

The 0-entropy with 0 = — log 2 is just the usual entropy. If M = 2™ and the 
probability law p is just the distribution of all templates s of length m, then 
one defines the 0-uncertainty as 
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{vs denoting the relative frequency of the template s in the augmented (circu- 
lar) extension of the original bitstring) . In order to get a limiting distribution 
for this statistic (under the usual null hypothesis), we normalize it as follows: 



with 

2™ 

■“ 0/(2-m)2-(m+l)0//(2-m)- 

Now the ’’approximate </>-entropy of order m” is defined as 

AH{m) = 

(see Rukhin (2000a, b)). The limit distribution (as n ^ oo, n denoting again 
the length of the bitstring to be tested), after centering, is as follows: 

£{nAH{m) - ^ y 2 ( 2 -+i _ 2 ™) 

(see Rukhin (2000b), p.l23, Mitra, Rao (1971), Theorem 9.2.2). If we define 
the classical Pearson y^-statistics as i.e., 

2 ^ {ws - n2~'^f 
^ n2-™ 

S 

then its relation to is that 

\ J /‘2 

,p(rn) _ _| Wl 

n ’ 

and thus for the difference sequence 

:= - '^m -1 = nAH{m - 1), 

whose law converges weakly, as n ^ 00 , to x^(2’”“^). Also, it can be shown 
that the laws of the second differences 

^ ' \^m ^m—1) V^m— 1 ^m—2) 

converge weakly (as n oo) to x^(2™“^). The Pearson y^-statistic as men- 
tioned here corresponds to the choice of (j){u) = u (see Rukhin (2000b), p.l24; 
see also Billingsley (1956)). 



7.8 The Ziv-Lempel Complexity Test 

Here, the test statistic W{n) is defined (recursively) as the number of words 
that arise if the bitsequence of length n is parsed into consecutive disjoint 
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words such that the next word is the shortest template not seen before. 
Loosely speaking, this is a test of compressibility of the source. The statistic 
W{n) behaves as follows (under the null hypothesis): First, 



E{W{n)) 
n/ log 2 n 



(n 



oo) 



(a result due to Aldous, Shields (1988); see Rukhin (2000b)). Furthermore, 
there is a constant ct > 0 such that distribtution of 



W{n) - E{W{n)) 
aW (n) 

tends weakly to a standard normal law. The value of a has the property 



a‘^W{n) 



n{C+h{log 2 n)) 
login 



{n oo). 



where C ~ 0.26600 is a constant and /i is a random slowly varying continuous 
function with zero mean and |/i(.)| < 10“® (Kirschenhofer et al. (1994), see 
Rukhin (2000b). p.l25). 



7.9 Maurer’s “Universal Test” 

Maurer calls his test ’’universal” because ”it can detect any significant devi- 
ation of a device’s output statistics from the statistics of a truly random bit 
source when the device can be modeled as an ergodic stationary source with 
finite memory but arbitrary (unknown) state transition probabilities” (Mau- 
rer (1992)). The statistic of Maurer’s test is closely related to the entropy 
per bit of the source, which is ’’the correct quality measure for a secret- 
key source in a cryptographic application” (Maurer (1992)). Perhaps, in our 
context, the word ’’universal” should be written in quotation marks; as we 
have stated before, there are no practically implementable universal (in the 
literal sense of the word as used in our text) tests of randomness. As the 
previously discussed Ziv-Lempel Complexity Test, the statistic of Maurer’s 
test measures the compressibility of the sequence. If the bitstream is sig- 
nificantly compressible, then it should be considered as non-random. Maurer 
(1992) rather discourages using the Ziv-Lempel complexity test. On the other 
hand, the disadvantage of Maurer’s test is that one must have an x of length 
n where n is of the order 10 • 2^ -I- 1000 • 2^ with 6 < L < 16. The first 
Q = 10-2^ blocks of L bits serve as initialization blocks, whereas the last 
K := \ n/L\ — Q blocks of length L are the test blocks. The size of Q makes 
sure that with high probability, all L-bit strings occur in the initialization 
blocks. Now Maurer’s test statistic is the following: 
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^ Q+K 

/n := ^ X! 

i=Q+l 

where U denotes the number of indices since the previous occurrence of the 
t-th template. In other words, the test consists of looking back through the 
entire sequence while inspecting the test segment of L-bit blocks, checking 
for the nearest previous L-bit template match, and recording the distance 
(in number of blocks) to that previous match. One finds that under the null 
hypothesis of i.i.d. unbiased bits, the expectation of /„ is given as 

OO 

E{fn) = 2-^jyi-2-^y-Hog^t. 

The variance can be approximately calculated as follows: 

Var(/„) = ^^^^Var(log 2 G), 

where G denotes a geometrically distributed random variable (with parameter 
1 — 2“^) and c{L,K) has the approximate value 

c{L, K)^0.7-'^ + (1.6 + 

However, Coron and Naccache (1999), who confirmed this approximation, 
warn that ” the inaccuracy due to this approximation can make the test to be 
2.67 times more permissive than what is theoretically admitted”. So it is also 
reasonable to test the hypothesis of randomness by verifying the normality of 
the observed values fn{obs) by the t-test, where the variance is unknown. For 
running this t-test, one should partition the observed sequence in a number 
of, say, r < 20 substrings, for every one of which one runs the test statistic, 
then calculates the sample variance, and finally determines the p- value from 
the t-distribution with r — 1 degrees of freedom (Rukhin (2000b)). 



7.10 Rank of Random Matrices Test 



Let R be the rank of an M x Q-random matrix with entries in IB. The 
possible values of R are 0, 1, . . . , m := min{M, Q}. By a calculation due to 
Kovalenko (1972), the random variable R obeys (under the null hypothesis) 
the following distribution: 



P{R = r) 



2r{Q+M-r)-MQ JJ" (1 - 2" Q)(l - 2" 



i=0 



1 - 2 *-’’ 
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Take M = Q >\Q. Then we may approximate 

oo 

P{R = M) « J]^(l - 2"^) « 0.2888 . . . , 

P{R = M - 1) « 2P{R = M) « 0.5776 . . . , 

P{R = M - 2) « = M) « 0.1284 . . . , 

whereas P(i? = r) < 0.005 for R ^ {M — 2,M — 1,M}. Let iV « n/M'^ 
(where n is the length of the observed bitstring) . So N can be interpreted as 
the new ’’sample size”, i.e., we can form N random (square) matrices with 
the observed input sequence. We calculate their ranks Ri, R 2 , ■ ■ ■ , Rn and 
determine the frequencies Fm, Fm-i, Fm -2 of the rank values M, M—1, M—2 
resp. among the Ri, . . . , Rn- Then we apply the chi-square test: The test 
statistic 

, ,2 (Fm - 0.2888iV)2 (Fm-i - 0.5776A^)2 
^ “ 0.28887V 0.5776A^ 

{N-Fm- Fm-1 - 0.1336A^)2 

0.13367V 

has, under the null hypothesis of independent unbiased bits, a chi-square 
distribution with 2 degrees of freedom. The p-value is 

p = exp{-x^{obs)/2). 



7.11 Linear Complexity Test 

The linear complexity of a finite bitsequence x = {xi}o<i<n of length n -I- 1 
is defined as the length of the shortest LFSR over the field IB that generates 
X. We refer to Section 5.1 for more information about LFSR, Such a shortest 
LFSR can be determined by the famous Berlekamp-Massey algorithm, which 
we will present in the following. (A generalization of the Berlekamp-Massey 
algorithm to residue rings was given by Reeds, Sloane (1985). We stress that 
high linear complexity is by far not sufficient for a sequence to be considered 
as ’’random” . E.g., the sequence 0, 0, . . . , 0, 1 has maximal linear complexity, 
but is very ’’regular”! Let L be the length of a LFSR. We can say that the 
LFSR (co. Cl, ... , Cl) G (where cl = 1) generates the sequence x if 

L 

CiXj^i = 0 (0 < j < n - L). 

For the Berlekamp-Massey algorithm it is useful to work with polynomials, 
since polynomial rings have ’’nice” algebraic properties. So we call the poly- 
nomial 
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L 

C{n){z) := 

i=0 

(which is of degree L) the characterisitc polynomial (or recursion polynomial) 
of the LFSR (see Section 5.1). In order to make explicit the dependence of the 
coefficients of n, let us write Cj =: Cn,i in the above formula. The Berlekamp- 
Massey algorithm is recursive and runs as follows: Suppose we have found a 
characteristic polynomial C(fc_i)(z) of degree Lk-i that generates the partial 
sequence = {xi}o<i<fc-i of length k. We want to find a characteristic 
polynomial C(^k){z) that generates x^^+^^.The length Lk of this new LFSR 
will be the degree of C(fc)(z). So we have 



Lk-i 

i=0 

On the other hand, 

Lk-i 

^ ^ Lk-i+i — 

i=0 

for certain Sk G IB. Now the recursion step is as follows: If Sk = 0, then 
C(fc_i)(z) also generates Xk and we can take 

C{k){z) = C(fc_i)(z) 



and hence 



Lk : — Lk—\. 



The more difficult case is when = 1. Let m be such that 



Lm—l ^ Lk—l 

(i.e., the length of the LFSR before the last jump of the length in the recur- 
sion). Then we have 



Lfc = max{Lfc_i, fc -I- 1 - Lfc_i} (7.12) 

and one possible choice for a characteristic polynomial C(^k){z) for is 

the following: 

C{k)(z) ■= z^>^~^'‘-^C(k-i){z) - (7.13) 

(Remark: In general, the characteristic polynomials of minimal degree are not 
uniquely determined. One can find all of them, but this is not of significance 
here, since we are only interested in the length of the shortest LFSR.) 
Proof of the Berlekamp- Massey Algorithm: 1. First we prove the in- 
equality 



Lk > max{Lfc_i, fc -|- 1 - Lk-i). 



(7.14) 
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The relation L}~ > Tfc-i being trivial, it remains to prove 



Lk>k+l-Lk-i. (7.15) 

Assume that C(fc_i)(z) generates but not Then it is not difficult 

to show (by some elementary manipulations with Laurent series) that there 
exist polynomials p{z) of degree < Tfc_i and p{z) of degree < such that 
the following Laurent series expansions hold: 



C(k-i){z) 



k-1 

y^^XiZ~'" + ykZ~^ + 
i=0 



and 



-i 

z = yxiZ 

2=0 



'C(fc)(^) 

such that Xk ^ Vk- Hence it follows that 

p{z)c(k){z) -p{z)c(^k-i){z) = C(k-I){z)c(k){z){{yk - Xk)z~'''^+^'> + 



...). 



Since Xk yf yk, it follows that Lk-i + Lfc — (fc + 1) > 0, which proves (7.15). 
2. Now we are ready to prove (7.12) by induction. The induction begins at 
the first jump of the sequence {Li}., i.e., for 

= (0,0,..., 0,1) e 



Here we have L^_i = 0 and Li = £+ \ = maxjO, £ + 1 — 0}, hence (7.12) (for 
the beginning of the induction) is fulfilled. For the induction step, assume 
that (7.12) is valid for k = m, i.e.. 



Lra = Lk-i = max{L„_i, m + 1 - Lm-i|- 



Since Lm > Lm-i it follows that Lk-i = Lm = m + 1 — Lm-i, hence 



k — m + Lm-i = k + 1 — Lk-i- (7.16) 

Now since c/k){z) is of degree Lk by definition, it suffices to show that it 
indeed generates for in this case it is (by 1.) a generating LFSR of 

minimal length. Put 

k 

2=0 

Write the Laurent series expansions 

oo 

C(m-l)i.z)x'^'""^^\z) =: 

2=0 
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and 

oo 

Then one can show that 



= 0 (0 < i < m - 1 - 

Otrn—Lm — l — ^5 

Pi = 0 {0 < i < k - 1 - Lk-i), 

(3k-Lk-i = 1 - 

If we develop, furthermore, 

I 

and 

i 

then we get the facts that 



'Yi_L^=0 {Lk-i < i < k - 1), 
Ik-Lk = 1 ) 

= 0 {k - m + Ljn -1 <i <k - 1), 
Ik-Lk = 1 - 

Hence we obtain 



Vj-Lk =0 {k + 1- Lk-i < j < k - 1) 

and 

l^k-Lk = 1 - 

Hence in the power series expansion of the product 



C(k){z)x^^^'^\z) =: 

2=0 



we have nj = 0 for 0 < j < k — Lk, which means that C(k){z) indeed generates 
the sequence . □ 

Rueppel (1986) gives the distribution of L„ under the null hypothesis of a 
genuine random sequence: From (7.12) it follows that iV„(L), which means 
the number of bitsequences of length n and linear complexity L, is given by 



2A^„_i(L) + iV„_i(n-L) 
2iV„_i(L) 
iV„_i(L) 



n> I > nl2 
F = n/2 
n!2> L> 0. 



Nn{L) 
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From this, the following corollary can be proved by induction on n: 
Corollary 7.1. P{Ln = 0) = 2“" and 

2niin{2n— 2L,2L — 1} 

P{Ln = L)= {1<L< n). 

By standard analytic calculations, the expectation, variance, and (more gen- 
erally) the generating function can be evaluated: 



E{L^) 



— I ( — I — 1 

2 36 2"^3 9^’ 




(see Rukhin (2000b)). 




8 DifRe-Hellman Key Exchange 



8.1 The DifRe-Hellman System 

Here, we will look at another public-key system, namely the DifRe-Hellman 
key distribution algorithm (Diffie, Heilman (1976)). It works as follows: Let 
a be a fixed non-multiple of a prime p. First, Alice chooses her private key 
XA G ^p-i- She determines her public key yA by 

VA = a^^&Z;. 

The same is done by Bob. Now, if Alice and Bob want to generate a secret 
Diffie-Hellman key, Alice requests Bob’s public key ys from the directory 
where it is published and generates the Diffie-Hellman key 

Bob does the same mutatis mutandis and gets 

yX^B ^^XBX A ^ 

which turns out to be the same as Alice’s Diffie-Hellman key! The secu- 
rity of the Diffie-Hellman procedure rests on the discrete logarithm problem. 
It is generally believed that solving congruential equations = b{mod.n) 
with respect to z (with a, 6, n given) is computationally difficult, in a cer- 
tain sense perhaps even harder than factoring integers. However, here also, 
it has not been proved that solving the discrete logarithm problem is really 
necessary for breaking the Diffie-Hellman system. Other cryptosystems based 
on the difficulty of the discrete logarithm problem are the ElGamal and the 
Massey-Omura system (see Beutelspacher (1993), p.l41). The following con- 
siderations are based on Massey, Waldvogel (1993). See this paper for further 
details. 



8.2 Distribution of DifRe-Hellman Keys 

In this section, we want to give some information about the probability dis- 
tribution of the keys in the Diffie-Hellman system. We start with the general 
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result and then we will show how the modulus parameter p should be cho- 
sen to get a ’’good” distribution (i.e., an ’’almost equidistribution” ) of the 
Diffie-Hellman keys. 

Theorem 8.1. Let p be a prime and denote by p — 1 = Y[^=iPT prime 
factorization of p — 1. Furthermore, suppose t G ^p-i and a a generator of 
Z*. Let 

K 

w(o‘) = X\pT 

i=l 

(0 < 6i < 6i) be the multiplicative order of a* . Put 

R{t) := \{{xa,xb) G Zp-i X Zp-i : = a‘}|. 



Then we have 



K 

= n PT~^iiPi - l)(ei - 6i -I- 1) -I- S{ei)) 

i=l 

(where <5(0) := 1 and S(e) := 0 (e ^ OJJ. 

For the proof of Theorem 8.1 we need some algebraic and combinatoric prepa- 
rations. We start with the following easy lemma (see, e.g., Hardy, Wright 
(1960), Theorem 57, Massey, Waldvogel (1993), Lemma 1): 



Lemma 8.1. The equation 

xaxb = t 

over Zp-i has solutions for xb iff (over Z) we have that gcd{xA,P — l)|t; 
moreover, in the latter case, the number of solutions for xb is gcd{xA,p— 1). 



So one can write 

R{t)= gcd{xA,p-l), (8.1) 

XA&S{t) 

where 

S{t) := {u G Zp^i : gcd{u,p- l)|t}. 

Lemma 8.2. Assume t G Zp-i. Then for all u G Z we have that gcd{u,p — 
l)|t iffgcd{u,p- l)|gcd(t,p- 1). 

Proof: If gcd(u,p — l)|t, then (since also gcd{u,p — l)|p — 1) it follows that 
gcd(u,p — 1)1 gcd{t,p — 1). On the other hand, if gcd{u,p— 1)| gcd{t,p — 1), 
then (since gcd{t,p— l)|t), it follows that gcd{u,p— l)|t. □ 

Hence 

S{t) = {uG Zp^i : gcd{u,p- l)|gcd(t,p- 1)}. (8.2) 

By a fact from elementary algebra (see Lidl, Niederreiter (1986), Theorem 
1.15ii)), we have 
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= -im — D • (8-3) 

gcd(i,p- 1) 

If we substitute this into (8.2), we obtain 

S{t) = {uG Zp-i : gcd(M,p-l)| ^ I }. (8.4) 

m(a^) 

Decompose p — 1 into prime factors: p — 1 = 0^1^^ (with > 1 and pi 
distinct prime factors). Lagrange’s Theorem, applied to G yields 

K 

w(o*) = X\pT^ 



with 0 < e,: < e,-. So if we substitute 



= \\PT 



into (8.4), we obtain 



S{t) = {u G Zp-i : gcd(w,p- 1)1 '^pT ®'}- 



If we consider the prime factorization 



gcd(u,p- 1) = Pj 



then this rewrites as 



S{t) = {uG Zp^i : gcd{u,p- 1) = PPi’,0 <Ci<€i- e*}, 
which, substituted into (8.1), yields 

ei— ei SK — e-K K 

Y. n^' 

Cl— 0 c/c— 0 beT(ci,...Cftr) 1 

ei — ei eK — ex K 

= E - E E 1 

Cl— 0 CK—0 i—1 6eT(ci,...c/c) 

ei — ei ex — ex K 

= E ••• E W_pT\T{ci,---,ck)\, 

C\—0 CX—^ ^=1 



T{ci,...,ck) ■■= {u G ^p_i : gcd(w,p- 1) = '^pT}- 



where 
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Lemma 8.3. For b G Z it holds that b G T{ci, . . . ,ck) ijj b G Zp-i and 
Proof; If we substitute 



K 



p-1 






into (8.6), we get 



T(ci, . . . , ck) = {u G Zp-i ■■ gcd('u,p - 1) = 



p-l 



K ei—a 



nr=ip; 






This implies that 6 G T(ci, . . . , ck) iff 6 G ^p-i and 



gcd(6,p- 1) 




From relation (8.3), (8.7) holds iff 



K 

= Y[pT~''*- 

i=l 



(8.7) 



(8.8) 



So 6 G T(ci, . . . , Ck) iff & G ^p~i and (8.8) holds. □ 

Proof of Theorem 8.1: Lemma 8.3 yields that |T(ci, . . . , c*:)| is exactly 
the number of elements in Z* with multiplicative order nZiPr"'- It is 
known (see e.g. Lidl, Niederreiter (1986), Theorem 1.15) that this number 
is Pi" °‘) (P denoting the Euler totient function). If we substitute 

this into (8.5) and use the multiplicativity of the Euler totient function <p for 
relatively prime elements and the fact that p{p^) = p^~^{p — 1), we obtain 



ei — ei — K K 

R{t)= E ••• E UpT 

ci—O CK—0 i—1 i—1 

= n E 

i—1 a—0 

K ei-ei-8{ei) 

= \{{pTm+ E 

i—1 Ci—0 

K ei-ei-5{ei) 

= Y[{pT^{ei)+ E p?pT~‘''~\pi-^)) 



2=1 

K 



ci=0 



ei — ei — 5{ei) 

= Y[{pT^i^^)+pT~^ip^-^) E I) 

2=1 Ci—0 
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K 

= <5(ei) +p1'~^(pi - l)(ei -Ci- S(ei) + 1)) 

K 

= WpT~^(.{Pt - l)(ei - + 1) + <5(ei)).D 

i=l 

By a simple calculation one deduces from Theorem 8.1: 

Corollary 8.1. If Alice and Bob choose their private keys Xa, resp. Xb, 
independently and uniformly at random in Zp-i, then 

= 1 TT(Pi^(e. - g, + 1) + ^). (8.9) 

P^ 

Let Pmin resp. Pmax be the minimum, resp. maximum, possible value (over 
all t G Zp-i) of the expression = a*). From Corollary 8.1 one sees 

that the minimum value is attained if e* = for all i = 1, 2, . . . , iF, i.e. if the 
m{a*) = p—1. This yields 

Corollary 8.2. 



1 ^ 1 1 

1 TT P^~ ^ ^ 

Pmin — ill ~ 1 • 

P~^iJl P^ P~^ 

This means that Pmin is smaller than the average key probability by only 
a small factor. On the other hand, the probability = a*) becomes 

maximum if ci = C 2 = . . . ck = 0, i.e., if m{of) = 1. Hence 

Corollary 8.3. 



Pmax — 



1 

p-1 



K 

n(‘ 



Pi 

i 

Pt 



+ 1 ). 



One can also show: 

Corollary 8.4. If Alice and Bob choose their private keys independently and 
uniformly in Z*_i, then the Diffie- Heilman keys are uniformly distributed in 
i.e., 

= a<) = (! e ®;_,) 

and zero else. 

Proof: Let 



R*{t) := {{xa,xb) e {^*p-if ■ = a*} 

= {{xa,xb) G (^*_i)^ : xaxb = t{mod.{p - 1))} 
= {{xAXf^t) : XA G ^p_i}- 
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So 

Furthermore, 

^ ^ ( 8 . 11 ) 

Substituting (8.10) into (8.11) yields the assertion. □ 



8.3 Strong Primes 

In accordance with Massey, Waldvogel (1993), paragraph 4, we will call a 
prime p a strong prime if it is of the form p = 2q + 1 with q prime. From 
Corollaries 8.2 and 8.3 we obtain, when p is a large strong prime: 



Pmin — 



1 1 g- 1 



1 1 



P-1 2 



and 



1 



Pmax — 



^ (^+1) 



p- 1 2 

1 



3. 



p- 1 2 ^ <7 ^ p- 1 

So in this case, Pmin and Pmax are of the same order of magnitude (namely 
the average key probability). One can also show the following relationship 
with the entropy: 



Corollary 8.5. If p is a strong prime, then 

log2(p- 1) - 2 < < log2(p- 1). 

Proof: With the aid of Corollary 8.3 we calculate 

=a‘)log2 Pma 



( 8 . 12 ) 



t — 



P-1 



— log2Pmoa: ^ ^ 



XaXb _ 



= «*) 



tG ^T) 



— ^*^§2 Pma 

= -l0g2( 



1 



K 



p — 1 ^ 



^ 1)) 



i=l 

K 



Pt 



= log2(p- 1) - ^log2(e *^^^ — - + 1) 

• ^ rt- 



K 



Pi 



> log2(p - 1) - XI log2(e* + !)• 
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This proves the left member of inequality (8.12). The right member is trivial, 
since it indicates the maximum possible entropy. □ 

This shows that for large strong primes, the entropy of the Diffie-Hellman key 
is practically maximum possible, or - in other words - to use strong primes is 
very good. Without proof we state Corollary 4 of Massey, Waldvogel (1993), 
which indicates which primes p are worst in the sense that they give large 
values of Pmax- 

Theorem 8.2. If p — 1 has the prime factorization p — 1 = Y[^=iPT > then 
an approximate upper bound for Pmax is given by the expression 

1 log(p-l) -14 1 

P-V K log Qi ’ 

where qt denotes the i-th prime, 

log(p - 1) _ , 

^ e(loglog(p- 1) - 1) 

K K, 



Pi = di, 

^,log^,(p-l) 
a ~ , 

Hi 

and [x] means the rounded value of the real number x to the next integer. 



The proof of Theorem 8.2 makes use of the Prime number Theorem, which 
states that 



log(?fc 



(fc 



oo). 



(In fact, Cebysev’s weak form of it suffices.) 

For p oo, it was shown in Canetti et al. (1999) that the Diffie-Hellman 
trjples (a^^, (a a primitive root modulo p) are uniformly dis- 

tributed in the sense of Weyl, i.e. interpreted (in the standard way) as ele- 
ments of the 3-dimensional unit cube [0,1]^. Their proof is based on estimates 
for exponential sums and the number of solutions of exponential equations. 




9 Differential Cryptanalysis 



9.1 The Principle 

So-called differential cryptanalysis belongs to the class of chosen-plaintext 
attacks and was invented by Biham and Shamir (1991). It is a method of 
cryptanalysis for block ciphers (in contrast to stream ciphers). (In order to 
avoid misunderstandings from the beginning, note that the term ’’differen- 
tial” is used because differences of elements of a commutative group G will 
be compared and it has nothing to do with calculus!) Let us describe the 
setting in detail. An r-round block cipher is an encrpytion algorithm that 
works as follows: For the first round, given an input A(l) and a round 
key the (deterministic) ’’enciphering function” / produces an output 

y(l) = f{X{l),Z^^^). The output of the first round is used as input for 
the second round X{2) := F(l), and as output of the second round we get 
Y (2) = f{X(2), etc. The final output of the algorithm will be the out- 
put of the r-th round Y(r). Here, all occurring data are blocks of a certain 
length whose elements belong to some finite abelian group, in practice of- 
ten some residue ring. The model assumption will be that all round keys 
are chosen as independent uniformly distributed random 
variables, for in general, only in this case do reasonable theoretic results be- 
come available. But interestingly enough, in practice, it seems to work as 
well or even better when the round keys are determined by some key sched- 
ule for a ’’small” overall key. Now the idea of differential cryptanalysis is 
that if one takes pairs of round inputs (A(i), X*(i)) and compares them with 
the round output pairs (F"(i), y*(i)), often there are relations between their 
differences AX{i) := X{i) — X*{i) and AY{i) := Y{i) — Y*{i) that allow 
as to infer information on the round key Z^"‘\ Informally speaking, the enci- 
phering function / is called cryptographically weak if for given AY{r — I), 
Y{r), Y*{r) for a relatively small number of input pairs (A(l), A*(l)), one 
can ’’easily” find the round key or at least some information about it. 
A pair of differences {a, (3) considered as values of a pair of first-round input 
and t-th-round output {a,j3) = {AX{l),AY{i)) is termed an Around differ- 
ential (or characteristic). Differential cryptanalysis is successful if there are 
differentials that are significantly more probable than others if the round keys 
are chosen uniformly at random. Now the differential 
attack proceeds as follows: 

D. Neuenschwander: Prob. and Stat. Methods in Cryptology, LNCS 3028, pp. 115-123, 2004. 
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- Choose an (r — l)-round differential (a, (3) for which the conditional prob- 
ability P{AY{r — 1) = (3 I = a) is relatively large. 

- Take a plaintext X(l) chosen uniformly at random and encrypt X(l) and 
X*(l) := Jf(l) -I- a to get the ciphertexts Y{r) and Y*{r). 

- Assume that (3 is the true difference AY (r — 1). Find all values of the round 
key that are consistent with r-round input difference (3 and output 
difference AY{r) = Y{r) — Y*{r). 

- Repeat the two preceding steps until some possible appears signifi- 
cantly more frequently than all the others. Then use this value as a guess 
for the r-th round key. 

- Do all these steps iteratively for r — 1, r — 2, . . . , 1. 

The creative act needed to mount a differential attack lies in the first step, 
i.e., to find a significantly more probable differential. This is why information 
about the distribution of differentials is important. We will treat this question 
in the next section. 

Fortunately, by the following theorem due to Lai, Massey, and Murphy, there 
is a lower bound on the complexity of a differential attack. Here, ’’complexity” 
means the number of times an encryption of a chosen plaintext pair must be 
made. 

Theorem 9.1. Let G be an abelian group (in particular, a residue ring), N 
be the block length, and put 

Pmax := max {P(Z\F(r - 1) = /3 | AX = a}. 

Then the average complexity C of the differential cryptanalysis has the fol- 
lowing lower bound: 

Pmax. 2 ^ — 1 

Proof: If the attack succeeds, then the anticipated value (3 has to occur at 
least once more than a uniformly randomly chosen other (3' . In K pairs of 
encryptions, on the average [3 occurs ATpmax and (3' occurs K{2^ — 1)~^ times. 
Thus ^ 

Kpuinx — K ^ ^ > I, 

which, by resolving with respect to K, yields the assertion. □ 

So, the smaller Pmax (i-e., the less there are significantly more probable dif- 
ferentials), the bigger the complexity becomes. 

Of course, the cardinal question here is how to design a cipher that, against 
differential cryptanalysis, is reasonably secure. It turns out that for this, the 
notion of a Markov cipher seems to be a natural condition. The following 
definition is due to Lai, Massey, and Murphy. 

Definition 9.1. An r-round iterated block cipher is called a Markov cipher 
if, when the first round key is chosen uniformly at random, then the 
probability 
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P{AY{1) = P I Z\X(1) = a, X(l) = 7 ) 
is independent of ^ for all a,P,^. 

To be exact, we need the model assumption of stochastic equivalence: 

Definition 9.2. The assumption of stochastic equivalence means that 
P{AY{r — 1) = P I AX{1) = a) has the same value for fixed round keys 
as if these round keys (i = 1,2, . . . ,r — 1) were 
independent and uniformly distributed. 

As Biham and Shamir have shown, e.g., DES is a Markov cipher. The relation 
of the above definition to Markov chains is the following theorem due to Lai, 
Massey, and Murphy: 

Theorem 9.2. If in an r -round Markov cipher, all round keys are chosen in- 
dependently and uniformly at random, then {Z\F(i)}o<i<r is a Markov chain. 

(Here, the term ’’Markov chain” will always mean ’’homogeneous” Markov 
chain.) 

Proof of Theorem 9.2: We have 

P{AY{l) = p^,AY{2)=P2,...,AY{r)=Pr \ AY{Q) = P^) 

r 

= \[P{AY{{)= Pi I Z\r(0)=/3o,L\T(l) = /3l,...,L\T(^-l) = /3i-l). 
However, 

P{AY{{)=Pi I AY{Q)=Po,AY{l)=Pi,...,AY{i-l) = p,_^) 

= Y^P{AY{i) = p„Y{i-l)=^ I AY{H) = Po,AY{l) = p^,..., 

■yeG 

AY{i - 1) = /3i_i) 

and 

P{AY{i) = Pi, y(i - 1) = 7 I AY{Q) = do, AY{1) = di, • • • , ^Y{i - 1) = di-i) 

= P(y(i-l )=7 I AY{0) = Po,AY{l)=Pi,...,AY(i-l)=Pi-y) 
■P{AY(i)=Pi I Y{i-l)=j,AY(0)=Po,AY(l)=Py,...,AY{i-l)=Pi-y). 

By the independence of the round keys and the definition of a Markov cipher, 
we have 



P{AY{i)=Pi I Y{i-l) = j,AY{0)=Po,AY{l)=Pi,..., 
AY{i - 1) = di-i) 

= P{AY{{) = P, I y(z-l)=7,Z\y(i-l) = dz-i) 

= P{AY{i) = p, I AY{i-l) = p,_i). 
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So we get 

P{AY{i) = l3i I AY{Q)= !3o,AY{l)= !3u...,AY{i-l) = 

= P{AY{{) = l3i I AY{i-l)=fii-i) 

■^P(y(i-1)=7 I AY{Q)^l5o,AY{l)=Pr,...,AY{i-l)=l3i-i). 

jeG 

Since the latter sum adds up to 1, we finally obtain 

P{AY{l)=Pu^Y{2)=P2,...,AY{r) = Pr \ AY{0) = Po) 

r 

= l[P{AY{i) = p, I Z\y(z-l) = /3,_i). 

i=l 

Homogeneity follows from the fact that all round keys have the same (uni- 
form) distribution. □ 



Lemma 9.1. For any Markov cipher, the uniform distribution on G^\{e} 
is a stationary distribution of the Markov chain {Z\y(t)}o<i<r- 

Proof: Put Y{i) = X{i+ 1) = e and choose Y*{i) = X*{i + 1) uniformly on 
G^\{e} at random. Then, since the cipher is Markov, the random variable 
AY{i) obeys itself a uniform distribution on G^\{e}. For any fixed {i + 1)- 
th round key z = the random variable Y*(i -|- 1) = f{X*{i + l),z) 

is uniformly distributed on G^\{f{e, z)}, since /(., z) is invertible. Thus for 
fixed z, the random variable AY{i-\-Y) is uniformly distributed over G^\{e}. 
Hence the same is also true without conditioning on □. 

A stronger notion than a stationary probability measure of a Markov chain is 
the concept of a so-called steady-state distribution. This means the following: 

Definition 9.3. The Markov chain {AY{i)}i>Q is said to have the steady- 
state distribution tt if for all a, P, AY{.) it holds that 

P{AY{i) = P I Z\y(0) = a) ^ T^{P) (* ^ oo). 

If a Markov chain has a steady-state distribution, then this is its unique 
stationary distribution. Now by the following theorem due to Lai, Massey, and 
Murphy, it turns out that Markov ciphers having a steady-state distribution 
are ’’immune” to differential cryptanalysis. 

Theorem 9.3. Under the assumption of stochastic equivalence, Markov ci- 
phers having a steady-state distribution are (asymptotically as the number of 
rounds tends to infinity) immune to differential cryptanalysis (in the sense 
that the average complexity tends to oo). 



Proof: From Theorem 9.1 and the fact that from Lemma 9.1 Pmax 
we have C ^ oo as the number of rounds tends to infinity. □ 



1 

2«-l> 




9.2 The Distribution of Characteristics 



119 



9.2 The Distribution of Characteristics 

As mentioned in Section 9.1, it is important to know the distribution of 
differentials under the null hypothesis that the keys are chosen uniformly at 
random. In this section, we will work in somewhat more generality in the 
sense that we will look at additive characteristics in powers of groups Zq. 
For q = 2 this is just classical differential cryptanalysis, since in this case, + 
and — are the same. Let q G IN , q > 2 and fix AX, AY € . Let tt be 

a uniformly distributed random permutation of {Zq)'^ (which occurs due to 
a randomly chosen key K) and consider the random variable At^{AX, AY) 
giving the number of (unordered) pairs {X, X'} C (Zq)"^ of plaintexts X, X' 
such that X + X' = AX and Y + Y' = AY, where Y = tt{X), Y' = tt{X') 
are the corresponding ciphertexts. 

We begin with an elementary algebraic lemma, whose proof follows from 
standard properties of linear diophantine equations. 

Lemma 9 . 2 . Let q G IN , q >2 and k G Z. Consider the equation 

2x = k (mod. q) {x G Zq). (9-1) 

If q is odd, then (9.1) has exactly one solution mod. q. If q is even and k 
is odd, then (9.1) has no solution. If q and k are both even, then (9.1) has 
exactly two solutions mod. q. 

The next lemma is the so-called ’’pairing theorem” , a combinatorial assertion. 

Lemma 9 . 3 . Let A = {oi, 02 , . . . , 020 } and B = {bi, 62 , ... , b 2 d} be alpha- 
bets with 2d distinct elements. Assume II a and II b are sets of unordered 
pairs such that at (resp. bi) occurs in exactly one pair of II a resp. II b 
( i = 1,2,..., 2d). Denote by I'{d) the number of bisections ip : A ^ B such 
that for pairs {ai,aj} G IIa we have {ip{ai),ip{aj)} ^ IIb. Then we have 

d 

k—O 

Proof: We order the set IIb as {{&', b)^j}}i<i<d and let P{i) be the number 
of bijections ip : A ^ B that map some pair of IIa to the pair {&', in 
IIb, i-e.. 



/ T\ ^ 

2'^kU2d-2k)\. (9.2) 

\kj 



P{i) := {ip : ip{a) = b'i,ip{a') = b)+i,{a,a'} G Pa}. 
By the principle of inclusion-exclusion we get 

nd) = { 2 dy.-\ U P{j)\ 

l<j<d 

= (2d)!+ ^ (_1)|S|| p|P(j)|. 

0#sc{i.2,....d} jes 



(9.3) 
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If we define (by symmetry) 

P{d,k) :=P(l,2,...,fc) 

■■=\ n (9-4) 

l<j<k 

we obtain the relation 

rP{d) = (2d)! + ^(-1)'= (fj P{d, k). (9.5) 

Now we order II a in the same way as IIb, i.e., as {{«(, a'_|_^}}i<i<d- Then 
P(d, k) can be interpreted as the number of functions ip : A ^ B for which 
there are exactly k pairs {a”, from II a such that ip{a”) = b^ and 

P^Wi+d) — ^i+d (* = 1, 2, . . . , fc). By elementary combinatorial considerations, 
it turns out that there exist (^) ways to select the k pairs {a” .a ” from II a, 
then k\ possibilities to assign the pairs {a",a'Y^} to the pairs and 

at the end 2^ ways to assign {a',a'^^} to a particular pair in IIb- Finally, 
the number of ways to assign the elements of 

A U {«-«"+.} 

l<i<k 

is given by {2d — 2k)\. So 

P{d, k) = Q 2'^k\{2d - 2k)\ (9.6) 

and the assertion follows from (9.5). □. 

Theorem 9.4. Suppose q G W, q > 2, q even. Let AX = (AXi, AX 2 , . . . , 
AXm), ay = {AYi, AY 2 , . ■ . , AYm) G such that at least one of the 

AXi and at least one of the AYi is odd. Then the distribution of the random 
variable At^{AX, AY) tends to the Poisson distribution given by 

P{H = k) = e-^/H~^k\-^ {kGN°) (m^oo). (9.7) 

Proof: li X+X' = AX , then from Lemma 9.2 it is not possible that X = X' . 
Thus, from Lemma 9.3 the number of permutations tt of {Zq)'^ such that if 
X + X' = AX then 'k{X) + Tr(X') yf AY is given by T(q”^), where 

d / r}\ ^ 

2'=fc!(2d-2fc)l. 



(9.8) 
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We get (for m in a neighborhood of 1), for the generating function of 
A^{AX, AY), 



E(^^aaax,ay)-^ = 






q'^/2V - 2k) 

k ) ■ 



(9.9) 



[From the definition of P{d, k) and (9.6) it follows that the expression 

P{d, k)P{d-k) 

(2d-2fc)! 

denotes the number of functions ip that take exactly k pairs from IIa to 
(with some abuse of notation). Then the number of permutations 
of {Zq)”^ for which k pairs of sum AX can be mapped into k fixed pairs of 
difference AY is given by the expression 



E 



P((7™-\fc)tF(g™-i - k) 
(g™ - 2k)\ 



-,m— 1 



From (9.6), we get 






-k) 



-,m— 1 



k\2^W{q"^-^ - k). 



The assertion follows.] 

We have to determine the limit of expression (9.9) as m ^ oo. For this, we 
first calculate the limit of (9.8) (with d = q^ f2) as m ^ oo. Put 



T{m,k) := (-1)^ 



( 7™/2 

k 



2’^kl{q"^ -2k)l. 



(9.10) 



Then for the ratio of two consecutive (with respect to k) such expressions 
T{m.k + 1) „ (g™/2-fc)2 



T(m, k) 



= -2 
= -2 



{k + 1)((7™ - 2fc)((7™ -2k-l) 

(g™/2- fc)2 

4(fc + l)(g™/2 - fc)2 - (fc + l)(g™ - 2k) 



= -(2(fc + l)(l- 



1 



:))■ • 



(9.11) 



q^ -2k' 

So asymptotically by successive multiplications of the terms (9.11) we obtain 

T{m,k) (-1)'= 



T(m,0) 2'=fc! 



■Sk, 



(9.12) 




122 9 Differential Cryptanalysis 



where ^ 1 when k G o(g™/2) {m oo), := 1> and Sk = 0{q'^) 
when k ^ o(g'"/2) (m ^ oo); hence (for S' as in (9.8)) T(m,0) behaves 
asymptotically as in the sense that 



T(m, 0) y/e 



(m oo). 



(9.13) 



Now, in view of calculating the generating function with variable u, we define 
an analogous expression as (9.10) (for u in a neighborhood of 1) but including 
an additional term u^, replacing the last factorial in (9.10) by S', and without 
change of sign (—1)*: 



Tu{m,k) 




2 

u^k\2’^^{q^ 



2k). 



Then, we get the ratio 



(9.14) 



Tu{m,k) ^ 

T,(m,l) kl^2^ 

hence the generating function has asymptotic behavior 



(9.15) 



E^^aaax.ay)^ ^ (m ^ oo). 



uq 



(9.16) 



From (9.14) and an elementary estimation, the term Tu(m, 1) in (9.16) be- 
haves as 



T„(m, 1) ~ 2(g™/2)"uS'(g™ - 2) 



^2™2-i_(g-_2)! (m^oo), 



(9.17) 



thus from (9.16) and (9.17) 



E{u^ 



aaax.ay)^ ^ e -/^ g ^-( g -~ 2 )! 



^/eq^\ 

g(i/2)(u-i) (m^oo).D 



(9.18) 



Next, let us treat the case where q is odd. In this situation, for any AX there 
is, from Lemma 9.2, exactly one X such that 2X = AX . 

Theorem 9.5. Suppose q G IN, q> 3, q odd. Let AX, AY G {Zq)"^. Then 
the distribution of the random variable At^{AX, AY) tends weakly to the dis- 
tribution (9.7) as m ^ oo. 

Proof: Let X G Zq be the unique solution of (9.1) and let X^ = {X,X, 

. . . , X)(g (Zq)"^). In contrast to the proof of Theorem 9.4, one has to count 
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those cases where 7 r(Xo) = Xq and 7 r(Xo) 7 ^ Xq separately, which yields as 
analogue of (9.9) the expression 



1 






E^^aaax,ay)^ 
(9“-1)/2 

E 

fc =0 

{q^-l)/2 

+(?’”- 1) E 



+ 1 - 2k)). (9.19) 

k — 1 J 



Now the same type of limit procedure as in the proof of Theorem 9.4, applied 
separately to both sums on the right-hand side of (9.19), yields the result. □ 
What remains is the case where q and all AXi,AYi are even. Here, by Lemma 
9.2, equation (9.1) has exactly 2 solutions, hence we get (by analogy to the 
foregoing cases) the following result: 

Theorem 9.6. Suppose q G W, q > 2, q even. Let AX = {AXi, AX 2 , 
. . . , AXm), ay = (AYi, AF 2 , • • ■ , AYm) G such that all AXi and all 

AYi are even. Then the distribution of the random variable At^{AX, AY) 12"^ 
tends weakly to the distribution (9.7) as m ^ 00 . 
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At the end of Chapter 1, in connection with the One-Time Pad, we dis- 
cussed the notion of perfect secrecy. The effect of perfect secrecy is that the 
adversary, even if he has unlimited resources, can not gain any information 
about the plaintext from the ciphertext, except its length (if this is not a 
known parameter). The fact that any cryptosystem leaks the information 
about the length of the plaintext will be proved below (Theorem 10.1). An- 
other notion, related to perfect secrecy, is that of so-called semantic security. 
Roughly speaking, semantic security is a polynomially bounded variant of 
perfect security, i.e., one assumes that the adversary has only polynomially 
bounded resources. 

Let us fix definitions and notations. 

In this chapter, we will use the term ’’random variable” in a somewhat non- 
classical sense: 

Definition 10.1. A random variable is a sequence of random vari- 

ables Xn in the classical sense defined over some common probability space 
(17, A, P) such that there is a polynomial Q so that (for all n) A„ ranges 
over The random variable is called polynomial- time, if there exists 

a probabilistic polynomial-time algorithm A such that 

P(A(1”) =x) = P(A„ = x) 

('I” means the string (1,1,...,!) G 

Definition 10.2. A cryptosystem is a triple (G,E,D) of probabilistic poly- 
nomial-time algorithms such that 

- For the input 1", algorithm G (the key generator) outputs two bitstrings 
Gi(l") and G 2 (l") both of length n. 

- For every pair (e,d) of encrpytion/ deciphering keys in the range o/G(l"), 
the encryption algorithm E and the deciphering algorithm D satisfy, for 
each plaintext x € IB"' , the relation 

P{Dd{Ee{x)) =x) = l. 

- There exists a polynomial Q such that for all e,x G IB", we have that the 
random variable Efix) ranges over 
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The last requirement has the disadvantage that it always reveals the length |x| 
of the plaintext. However, cryptosystems always leak the information about 
the length of the plaintext, as the following theorem shows: 

Theorem 10.1. Let (G,E,D) be a cryptosystem not necessarily satisfying 
the length conditions imposed in Definition 10.2. (In particular, E is defined 
for every possible key and every plaintext and the only restriction on the 
distribution of the length of the ciphertext produced by E is that it must be 
polynomial in the length of the inputs to E.) Then this scheme can not hide 
the length of the plaintext. 

Proof: Let (in accordance with Definition 10.2), e be an encryption key in 
the range of G(l") and consider the random variables and 

for some m that is polynomial in |e| (the length of e). If the encryption 
hides the length of the plaintext, then these two random variables have 
to be polynomially indistinguishable. Let Q be the polynomial bounding 
the running time of the encryption algorithm E and let m take the val- 
ues |e|, |e| -I- 1, . . . , (5(2|e|) -I- 1. Then we find that the random variables 
and are polynomially indistinguishable, hence (since 

P(|i?e(l^'^^)| < Q(2|e|)) = 1) we have the lower bound 

P(|^^(lQ( 2 |e|)-e 2 )| < Q(2|e|)) > 2/3. 

But from the fact that the code must be uniquely decipherable, it follows 
that for at most half of the bitstrings x of length |x| = Q{2\e\) -I- 2 it holds 
that 

P{\Ee{x)\ < Q(2|e|) + 2) > 1/3. 

So there exists an x G JBQ(2|e|)+2 such that Ee{x) and are 

distinguishable by just measuring their lengths, i.e. in polynomial time. This 
is a contradiction. □ 

Now we come to the definition of semantic security. 

For a set E, the symbol E* denotes the set of all finite sequences of elements 
of 27. 

Definition 10.3. H (secret-key) cryptosystem (G,E,D) as in Definition 
10.2 is called semantically secure if, for every probabilistic polynomial-time 
algorithm A, there exists a probabilistic polynomial-time algorithm A' such 
that for every polynomial-time random variable {Xn}n>i, every polynomial- 
time computable function h : IB* IB* , every function f : IB* — > IB* , every 
positive constant c, and all sufficiently large n, 

P(H(PGqi~)(X„),M^«),l”) = /(X„)) 

< P(H'(|X„U(X„), 1") = /(X„)) + n-F 

This definition can be explained as follows: The role of the function h is to 
provide partial information on Xn to both algorithms, which then try to find 
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f{Xn)- As we shall see in the sequel, the meaning of semantic security is, 
roughly speaking, that the distribution of the random variables 

A{E{X^),h{X^),l^) 

and 

A'{\X„\,h{X„),in 

are close in a certain sense. We will see that if also the function / is supposed 
to be computable in polynomial time, then the definition of semantic security 
remains equivalent. In Definition 10.2, we considered secret-key cryptosys- 
tems. The corresponding notion of semantic security of a public-key system 
should be formulated in the sense that the public key is given to the algorithm 
A as additional input. Evidently, any public-key system that is semantically 
secure has a probabilistic encryption algorithm (otherwise, look at a random 
variable that is uniformly distributed on the two-point set {0", 1"}). 
Now we will show that semantic security is equivalent to so-called indistin- 
guishable encryption, which means the following: 

Definition 10.4. A (secret-key) cryptosystem (G,E,D) as in Definition 
10.2 has hte indistinguishable encryption property if, for every polynomial- 
time random variable = {(X„, Z„)}„>i with |X„| = |F„|, every 

probabilistic polynomial-time algorithm A, every positive constant c, and all 
sufficiently large n, we have that 

|P(A(Z„,EG,(in)(X„)) = l)-P(A(Z„,EG,(in)(r„)) = 1)1 <n-^ 

The random variable has to be interpreted as additional information, on 
the space of plaintexts, given to algorithm A, which tries to distinguish the 
encryptions of X„ and F„. An analogous remark as we have stated for se- 
mantic security of public-key systems holds here: The public key (i.e. Gi(l")) 
should be taken as an additional input to the algorithm. 

Our goal now will be to show that semantic security and indistinguishable 
encryption are equivalent properties. Especially the direction that indistin- 
guishable encrpytion implies semantic security is very important, since it 
often seems easier to prove the former than the latter. 

Theorem 10.2. A (secret-key) cryptosystem as in Definition 10.2 is seman- 
tically secure iff it has the property of indistinguishable encryption. 

For the proof of Theorem 10.2 we verify both directions by individual propo- 
sitions, which are both in fact stronger than the corresponding direction of 
Theorem 10.2 and thus of a certain own interest. 

Proposition 10.1. Let (G,E,D) have the property of indistinguishable en- 
cryption and suppose, furthermore, that Zn = X„Yn. Then the system is 
semantically secure. 
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Proof: Let us assume that the system is not semantically secure. We will 
prove that in this case it has distinguishable encryptions with = XnYn- 
This will be done by showing that if for some {X„}„>i, /, h as in Definition 
10.3, and a probabilistic polynomial-time algorithm A, there exists a proba- 
bilistic polynomial-time algorithm A' such that if A guesses f{Xn) from the 
encrypted value and h{Xn) better than A' does on input |X„| 

and h{Xn), then one can distinguish the encryptions of Xn and := ll^"! 
(using Zn = XnYn as auxiliary input). Let A be a probabilistic polynomial- 
time algorithm that tries to guess partial information (i.e., /(X„)) from the 
encryption of X„ and the a priori information h{Xn). Namely, on input 
and h{x), the algorithm A tries to guess f{x). Now we construct 
a probabilistic polynomial-time algorithm A! that has as good performance 
but without getting the input EQ^(^in-^{x). This algorithm will run algorithm 
A with input i?( 5 j(in)(ll'^l) and h(x). We will show that 

P(A(EG,(ir.^(Xn),h(Xn),n = f(^n)) 

< P{A'{\Xn\,h{Xn), 1”) = f{Xn)) + n~^ 
or, as is equivalent, 

P{A{Ea,(l^){Xn)MXn),Y) = f{Xn)) 

< h{Xn), 1”) = f{Xn)) + n-\ 

Assume the contrary and let c > 0 be such that the above fails for infinitely 
many n. Then we have 

P{Xn G Bn) > 

where is the set of bitstrings x of length m with the property 
P(A(Pg,(i»)(x),M^),1") = /(x)) 

> P(A(Pgli")( 1”), /i(x), 1”) = f{x)) + in-G 
If Dn denotes the set of bitstrings of length m satisfying 

|P(A(PG,(i»)(x),Ma:),l”)=e.) 

-P{A{Eg,h^){Y)M^), 1”) = e.)l > (10.1) 

for some then we have 

P{Xn G Dn) > in-". 

We now define a random variable {Zn\n>i = and construct a 

polynomial-time probabilistic algorithm A\ that, given auxiliary information 
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{X„, 1™) (where m := |-^n|), distinguishes the encryptions of G _D„ and 
fm algorithm Ai is defined as follows: 

On input x, 1™, and Ee{w) (where w G {x,l^} and e is in the range of 
Gi(l”)), the algorithm has two steps. Roughly speaking, the first step consists 
of checking if cc G Dn and, if yes, then to find a satisfying relation (10.1). 
(This can be interpreted as some sort of ’’witness” for the fact that x G 
Dn-) Then, in the second step, the algorithm ’’guesses” the identity of w by 
checking if A{Ee{w) , h{x) , 1”) = ^x or not. In detail: 

- Ignoring Ee{w), the algorithm first gathers information on the statistics of 
the random variables A{EQ^(^in-^(x), h{x), 1") and A{EQ^(^in){l'^),h{x), 1”) 
by computing h{x) and running A polynomially often (depending on 
the desired accuracy and determinable by the following equation (10.3)), 
each time giving to A as input a randomly computed i?c;i(i")(a:), resp. 
if( 3 i(i")(l’")) and the data h{x) and 1”. Put 

PxAO ■■= P{A{EG,n^){v),h{x), 1”) = a (10.2) 

and let Px,v{0 be a random variable representing the estimator of Px,v(0 
that is obtained by polynomially many runs, the polynomial again defined 
such that (10.3) (see infra) holds. If we fix x and v, then with probability 
at least say 1 — 2“”, for every possible value ^ we have that Px,v{0 is a 
good estimator for Px,v{0 in the sense that 

ba:.t,(6 -Px,«(OI < (ib.3) 

If a; G Dn, it holds that 

\Px,xiO - Px,i^iO\ > 

So from (10.3) we find that with very high probability, if x G Dn, we can 
find a A with 

\PxAn-px,Mn\>ln-^- (10.4) 

If such a A cannot be found (as just mentioned, this occurs with very low 
probability), then the algorithm Ai terminates here and outputs (oblivi- 
ously of Ee{w)) the value 1. 

- Assume we have found the above-mentioned A- W.l.o.g. we may assume 
that 

PxAC) > Pxp^iA) + 

Now algorithm Ai runs the algorithm A{Ee{w),h{x),lA and gives 1 as 
output iff A yields output A- 

It remains to analyze the performance of the just-defined algorithm Ai (i.e., 
to prove that it is really a probabilistic polynomial-time algorithm) on input 
Ee{w) (where w G {x, 1™}), I*”, and x. We have to distinguish 3 possible 



cases: 
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- If X € Dn (this event will be denoted by Ci in the sequel), w.l.o.g. we may 
assume that the (which has been found with probability at least 1 — 2“" 
say) satisfies 

Px,x{C) > PxA’^iC) + 

Then it follows that 



P(Ai(x,l™,ifG,(l~)(x)) = l)-F(Ai(x,l™,i?G,(l~)(n) = 1) 

> (l-2-”)|n-‘=-2-” 
8 

3 -c 1 -2c 

^ 8 ^ 32 ^ ■ 



- If a; ^ Dn, yet there exists a ^ with 

\Px,xi,^) Px,i'^{^)\ ^ gtr 

(the event of these two conditions will be denoted by C 2 ), then with prob- 
ability at least say 1 — 2“", one of the following two alternatives happens: 
Either Ai has terminated after its first step or the expression (estimator) 

has the same sign as its ’’true” counterpart 

PxAO-pxA’T^in- 



It follows that 



P{A,{x, 1 ^,Eg,hA^)) = 1) - P{A,{x, l’",EG,(in)(l™)) = 1) > -2-" 

1 -2c 

^ 32 ■ 

- In the remaining case (event Ca), independently of the fact if the estimator 
calculated in the first step of returns the correct result or not, one finds 



P{A,{x,l^,EG,aAi^)) = l)-P(Ai(a:,l™,EGpin)(l™)) = 1) 



Let H{z,t) denote the event that Ai yields 1 on input ( 2 ;, ifGi(i")(t))- We 
find 

P{H{Zn,Xn))-P{H{Zn,lA) 

> P{Xn = x)-{P{H{xl^,x))-P{H{xl^,lA)) 
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> 

2 8 32 8 

1 

= — n 
32 



-2c 



So Ai distinguished encryptions of the ’’halves” (i.e., X„ and 1™) of the 
polynomial-time random variable Zn-O 



Proposition 10.2. Let the cryptosystem (G,E,D) be semantically secure 
as in Definition 10.3. Furthermore, suppose that f is polynomial-time com- 
putable, quantifiers are reversed in the sense that instead of the scheme 

yA3A'y{Xn}yhVf 

(’’strongest” possible order) we have the ’’weakest” possible order 

yAy{Xn}yhyf3A', 

and that the conditional distribution of f~^{X„) given h{X„) is a symmet- 
ric Bernoulli one. Then {G, E, D) satisfies the property of indistinguishable 
encryption. 

Proof: The proof is of a similar nature to that of Proposition 10.1. Assume 
(G,E,D) has distinguishable encryptions and assume that there exists a 
polynomial-time random variable {Tn}n>i = {AT„y„Z„}„>i, a probabilistic 
polynomial-time algorithm A and a positive constant c such that for infinitely 
many n, it holds that 

|P(A(Z„,L;G,(in)(A„)) = l)-P{A{Z„,EG,ii^){Y„)) = l)\ > n~y (10.5) 

We may assume |A„| = \Z„\ = n. Let {Qn}n>i be a (polynomial-time) 
random variable that takes the two possible values 0"Z„A„y„ and l"Z„A„y„ 
both with probability one half. Let / : JB"*" ^ be the function that 
returns the first bit of every bitstring of length 4n. On the other hand, define 
h : JB^" ^ JB3" as the function omitting the first n-block of its argument 
and, if this block was 1", interchanges the order of the other two last n-blocks 
of the argument. By this definition, the random variables h{Qn) = ZnX„Yn 
and f{Qn) are independent. Also, / and h are computable in polynomial time 
and do not depend on A. Let us now construct a probabilistic polynomial- 
time algorithm A 2 guessing /(Q„) from h{Qn) and EGi{io.){Qn)- Let S be 
of the form 0’’wxv or I'^'wvx. Again, A 2 will consist of effecting 2 steps on 
input h{6) = wxv and Ee{S). 

- Ignoring Ee(6), algorithm A 2 samples polynomially many times and cal- 
culates the difference estimator 
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A{w,x,v) := P{A{w,Eai(i^){x)) = 1) - P{A{w, Ea^{i^){v)) = 1) (10.6) 

such that this estimator differs from the true value by less than with 

probability at least 1 — 2“” say. 

- W.l.o.g. we assume that the estimator calculated in (10.6) is positive. Then 
the algorithm A 2 gives the fourth n-block of Ee{6) together with w as input 
to algorithm A und outputs 1 if A outputs 1 and 0 else. 

It remains to do the performance analysis of A 2 . Let be the event that 
A 2 successfully guesses f{Qn) given i?Gi(i")(On) and h{Qn)- Furthermore, 
Ln denotes the event that the estimator (10.6) has the correct sign. Then we 
obtain 



P{Hn\Ln, h{Qn) = WXv) 

= P{f{Qn) = l)P{Hn\Ln, h{Qn) = WXV, f{Qn) = 1) 

+P{f{Qn) = 0)P(iF„|L„, h{Qn) = WXV, /(Q„) = 0) 

= ^P{Mw,EG,(ir.){x)) = 1 ) + ^P{A{w,EG^(^l<^){v)) = 0 ) 

1 A{w,x,v) 

^ 2 ^ 2 ■ 

W.l.o.g. we may assume that A{w,x,v) > 0. Now we split the situation into 
3 possible cases: 

- If A{w,x,v) > (denote this event by Ki), then with probability say 
1 — 2“" the estimator has the correct sign. So 

P{P[n\h{Qn) = wxv) > ^(estimator correct)(^ + ^^~°) 



- If < A{w,x,v) < (this event will be called K 2 ), then here also 

with probability say 1 — 2“", the sign of the estimator is the correct one 
and one calculates 



P{H^\h{Qn) 



wxv) > P(estimator correct) (^ + ^°) 




- If the remaining event (which will be called K^) holds (i.e., if A(w, x, v) < 
then 

P{Hn\h{Qn) = wxv) > ^ 

Z o 
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Putting the three cases together, we thus obtain 

3 



P{H^) = Y^P{Kj)P{H^\Kj) 

J=1 



This means that the probability of success of the algorithm is significantly 
greater than which is a contradiction. □ 




11 * Algorithmic Complexity 



In this chapter, we present a ’’non-classical” definition of randomness that is 
of a quite different nature from the other criteria in the previous chapters. 
Namely, loosely speaking, a bitstring can be called ’’random”, if the shortest 
program (in the sense of a Turing machine) for describing the string is the 
string itself. The length of this shortest program can be viewed as some 
sort of ” algorithmic complexity” measure, which itself is of rather theoretical 
value, but one can show that it is indeed in ’’most” cases closely related 
to the linear complexity. So somewhat surprisingly, for ’’most” cases (in a 
measure-theoretic sense to be specified), the linear complexity seems to be 
a ’’universal” randomness criterion! (However, this definition does not apply 
to individual sequences!) 

Definition 11.1. The Turing-Kolmogorov-Chaitin complexity (TKC com- 
plexity for short) y(x) of a bitstring x G IB" of length n is the length of 
the shortest program for a universal Turing machine U that makes U simu- 
late a Turing machine generating x. 

Unfortunately, we have 

Proposition 11.1. In general, the function \ is not computable. 

(Note that by the famous Church Thesis, computability by a Turing machine 
has turned out to be equivalent to all ’’reasonable” computability notions, 
e.g., to be a primitive recursive function.) 

Proof of Proposition 11.1: Assume the contrary and let AT > 0 be any 
constant and Tk a Turing machine that generates and inspects bitsequences 
in lexicographical order until a sequence x with xi^) > ^ appears and then 
accepts this x. Denote by p,{Tk) the length of the program for U that makes 
U simulate Tk- Then we have 

MTic) = 0(logK), 



but on the other hand 

t^{TK) < K. 

So if K is chosen large enough, this yields a contradiction. □ 



D. Neuenschwander: Prob. and Stat. Methods in Cryptology, LNCS 3028, pp. 135-138, 2004. 
© Springer- Verlag Berlin Heidelberg 2004 




136 11 * Algorithmic Complexity 



However, one can estimate the average behavior of x- 

Proposition 11.2. For 1 < k < n we have 

|{x G JB” : x{x) < k}\ < 2^+\ 

Proof: This follows from the fact that among the 2^+^ Turing machines T 
with /i(T) < k there are at most 2^+^ sequence generators. □ 

As a consequence, we obtain that ’’practically all sequences of moderately 
large length” have a TKC complexity close to the length of the sequence. In 
other words, a ’’truly random sequence has no shorter description than just 
the sequence itself’ . 

Corollary 11.1. If one assumes that the sequences x are uniformly dis- 
tributed on IB^ , then 

P(x(x) > (1 — e)n) > 1 — 2“'^”+^ (e > 0). 



Next we show that the TKC complexity x(x) and the linear complexity L(x) 
are asymptotically the same for ’’practically all” sequences x G IB" (as n ^ 
oo). 



Theorem 11.1. Let to be the Haar measure (uniform probability distribution) 



Then 






1 (n ^ oo). 



(As usual, ”a.s.” means ’’almost surely”, i.e. ”w(. . .) = 1”.) 
The proof needs a sequence of several lemmas. 

Lemma 11.1. For t, u g] 0, n[ we have 

|{xGlB”:n + t<L(x)}|<i2"-‘+i-i, 



( 11 . 1 ) 



|{x G IB” : x(x) < n-M}| < 2”-“+i, (11.2) 

|{x G IB” : L{x) <n + t,n-u< x(x)}| > 2” - - 2”-"+i. (11.3) 

Proof: Inequality (11.2) and the conclusion (ll.l),(11.2)=i>(11.3) are trivial. 
So it suffices to verify (11.1). By Corollary 7.1 we have indeed 



|{x G IB" : n + t < L{x)}\ 



E 



22n— 



n+t<2i<2n 



< 
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^22n-2L^J-2 2- 

0<i<n-L^J-l 



_ 22n-2[2i±iJ-2 






3/4 



3 3 



< -2 






.□ 



Lemma 11.2. 



|{x G iB” : 0 < L{x) < n - 01 < -2”-*+^ 

o 



1 

3’ 



(11.4) 



Proof: 



|{x G iB" : 0 < L{x) < n — t}\ 
< 1 + 

l<2i<.n—t 

< 1 + Y 2^*"^ 

2<2i<2[2i^J-2 

= 1 + 2 Y 2^* 

0<i<L^J-2 



- 3 3 

Proposition 11.3. For all e GlO, 1[ we have (under the uniform distribution 
of X on JB”/ 

P((l - e)L{x) < x(x)) > 1 - ^2-*", (11.5) 

P((l - e)x(x) < L{x)) > 1 - - ^2-”. (11.6) 

o o 

Proof: Put t := so that (1 — £r)(n + t) = n — t. Now with the aid of 

Lemma 11.1 we get 



P{{l-e)L{x) < x{x)) 

> P{L{x) <n + t,{l- e){n + t) < x(x)) 
= P(L(x) <n + t,n — t< x(x)) 

> 1 - ^2“*+^ - 2“*+^ 

3 

— 1 _ ^2“®"/^^“'^) 
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which proves (11.5). For (11.6), put 

t := en — {1 — e) [log n] , 



so that 



(1 — e){n + [logn]) = n — t. 



Now from Lemma 11.2 



< L{x)) 

> P{x{x) <n+ [logn], (1 - e){n+ [logn]) < L{x)) 

= P{n — t < L{x)) 

^ ^ _ j}_2— £"+(1— £)(i+iog")+i _ □ 

- 3 3 ■ 

Corollary 11.2. For e c]0, 1[, we have (under the uniform distribution of x 
on 

P((l — e)L(x^"^) < x(x^"^) < (1 + e)L{x^"^) ^1 (n ^ oo). 

Proof of Theorem 11.1: For x = {xq,Xi, . . .) G 1B^ (to < n), denote 

X^ ’ ^ ■ — (^mi ^m+1, • ■ • , ^n)- 

Define, for k G INq and £ g] 0, 1[, the (independent) events 

A,,, := {xeB^:{l- e)L{x^^'‘-' < x(x(2'‘-^2'“-l)) 

<(l + £)L(x(2'‘-^2'“-l))}. 

Then Corollary 11.2 yields 



OO 

'^u;{Ak^e) = oo 

/c=l 



and the assertion follows from the Borel-Cantelli Lemma. □ 




12 Birthday Paradox and Meet-in-the-Middle 
Attack 



12.1 The Classical Birthday Attack 

In this chapter, we will discuss the aspect of integrity, i.e., the danger that 
Eve could change the message sent by Alice so that Bob does not notice that 
the message he receives is now fraudulent. 

The following so-called ’’birthday paradox” is well-known in probability the- 
ory: Suppose there are 23 persons in a group. Then the probability that there 
exist two persons whose birthdays coincide is more than 1/2. More generally, 
consider a group of k persons and let n be the number of possible ” birthdays” 
(so in the above example fc = 23 and n = 365). Let q„^k be the probability 
that there exist no two persons with the same ’’birthday”. One calculates 

fc-i 

Qn.k = JJ('^ - ^) 

k-1 

k-1 

< n 

i=l 

So if 

k> (l-k\/l-k8nlog2)/2, (12.1) 

we have q„,k < 1/2. 

Now we will introduce the notion of a so-called hash function. A hash func- 
tion is a function that maps bitstrings of arbitrary (but finite) length to 
bitstrings of some fixed maximal length n. We will assume that for every 
bitstring x, the image h{x) is easy to compute, but that it is computationally 
infeasible to find, for a given value y, an inverse image x such that y = h{x). 
The cryptologic application is that if x is the message that Alice wants to 
transfer to Bob, then she in fact sends h{x), which Eve can not invert. How- 
ever, since x can be arbitrarily long, but h(x) has maximal length n, there 
must be collisions, i.e. bitstrings x, x' such that x yf x' , but h{x) = h{x'). The 
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hash function h is called strongly collision resistant if it is ’’infeasible” to find 
two different colliding bitstrings x,x' such that h{x) = h{x'). The so-called 
birthday attack is the attack to find such x, x' with not too small probability. 
By the preceding discussion, if one chooses, e.g., fc > (1 -I- -\/l -I- 8nlog2)/2 
bitstrings (1 < * < fc) and calculates their values then with 

probability more than 1/2 there are two different X(i^-)^X{i^-) such that 






12.2 The Generalized Birthday Problem and Its Limit 
Distribution 



In the following, we will consider the following variant of the birthday prob- 
lem, which will be the key of the so-called meet-in-the-middle attack, that we 
will present in the next section. Consider a set E of n elements and draw two 
samples Er and Eg of sizes r, resp. s, (with replacements) from it. What is 
the probability P{n,r,s,i) that exactly i elements belong to both samples? 
Define 

Q{n,r,k) := P(\Er\ = r - k) 

(the probability of k coincidences in one sample (with replacements) of size 
r) and 



p[{n,r — k, s — £,i) := P{\Er C] Eg\ = i \ \Er\ = r — k,\Eg\ = s — £) 



(this is the probability that the intersection of the two samples of size r — k, 
resp. s — £, drawn without replacements contains i elements). Then we obtain 

r—i s—i 

P{n,r,s,i) = Pi\J [j{\Er\=r-k,\Es\=s-e,\ErnEs\=i}) 
k=0£=0 

r — i s — i 

-EE P{\Er n Es\ = i, \Er\ = r- k\\Es\ = s - £)P{\Er\ = r-k, \Es\ = s - i) 
k=0 £=0 

r — i s — i 



-EE Q{n, r, k)H(n, r — k, s — i)Q{n, s,t). 



(12.2) 



fc=0 i=Q 



Standard combinatorial reasoning yields that H{. . is given by the hyper- 
geometric distribution: 



iJ(n, r, s, z) = 



n — r 
s — i 



(12.3) 



Now let us evaluate Q{n,r,c). Clearly 



Q(n, r, c) = 0 {r > n,c < r — n). 



(12.4) 
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In the other cases, observe that the c coincidences are drawn from a set of 
r — c elements, which is equivalent to choosing from the r drawings ones 
to yield the first element, «2 ones to yield the second element, etc., until one 
has chosen all r — c elements. If we define 



R := {a := (o;i,a 2 , ■ • ■ , ctr— c) : G {1, 2, . . . , c + I},^aj = r}, 

1=1 

then the number of ways each vector in R can be ordered is given by the 
multinomial coefficient . Hence 



Q{n,r,c) 




(12.5) 



Let us now determine the limit of P{n, r, s, i) (for n,r, s —>■ oo under suitable 
common behavior of n, r, and s). It turns out that it is given by a Poisson 
distribution as follows: 

Theorem 12.1. 

p/ .s, rs 

P[n,r,s,i)^e — [n,r,s^oo,- > X, > > v). 

i\ 2n 2n n 

For the proof, we will separately (in the form of lemmas and corollaries) con- 
sider the asymptotic behavior of Pl{. . .) and Q{. . .). Then relation (12.2) will 
yield the result. The following limit theorem for the hypergeometric distri- 
bution is well known and easy to prove: 

Lemma 12.1. 



H{n, r, s,i) ^ e 



Lemma 12.2. 



Q{n,r, c) ^ e 



-A 





(n, r, s 


— 


i\ 


A= 


J 


— 


c! 



rs , 
oo, > V). 



2n 



A). 



Proof: If a G i? with k components yf 1 (evidently k < c), the summand 
(0) in the sum (12.5) occurs exactly times, hence, if we define Rk as 
the set of all non-decreasing sequences a = (oi, «2, • ■ • , Q^fc) of length k with 
elements in {2, 3 . . . , c -I- 1} such that ctj = c + k, we can write 

(r-c) (r — c)! 



Q{n,r,c) = 



rP 

( ” ) 
\r—c/ 



E 



E 



k\(r — c — k)\ 
k—l ^ cx^Rk 

C r-i-k ^ 

(-^)!EVE(n«^o- 



r, 2 c 



aGRk 



— (1 + ^) 

n’’(n — r — c)! 2°c! r 



g-rV(2n) j,2c 
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with 



c-l 

7 := 2°r^“'^c! r~^ j j\. 

i=i 



So finally we find 

g(n,r,c)~e-’'^/(2")(^)7c!, 
which yields the result. □ 

For the determination of the limit of P{. . .), we study the behavior oi H {. . .) 
in some more detail: 

Lemma 12.3. For ^ ^ we have the estimations 



, K ^ Nl , 

- Ni^{N -K)\ - 



K 



)■ 



This lemma yields the following two inequalities: 

Corollary 12.1. 

H{n, r, s,i)fi{n,r,s,i,k,£) < H{n,r — k, s — £, i), 



where 

/j(n, r, s, i, k, £) := g{r, i, k)g{s, i, £)g{n, r, k)g{n, s, £)e»->— , 



with 

, ... _h.—±L. i N fc 

g{r,i,k):=e - ->(1--)'". 

r 

Corollary 12.2. 

H{n,r — k, s — £,i) < H{n, r, s, i)fs{n, r,s,i,k,£), 



where 

fs{n,r,s,i,k,£) := g{n,r,i,k)g{n,s,i,£)e^^^^^^ ^ , 

with 

g(n,r,i,k) := . 

With these two corollaries, we obtain the limits 

Corollary 12.3. 

~ r^ rs 

fi{n,r,s,i,k,£),fs{n,r,s,i,k,£) ^1 {n,r,s ^ oo,- > A, > g, > n). 

zn Zn n 
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Now we may proceed to the proof of Theorem 12.1. For fixed a, (3, we have, 
since all occurring terms are positive, 

a. /3 

P{n, r, s, i) > EE Q(n, r, k)H{n, r — k, s — £, i)Q{n, s, £) 
k=o £=o 

> H{n, r, s, i)fi{n, r, s, i, a, (3) 

a 0 

■'^Q{n,r,k)'^Q{n,s,£). (12.6) 

k=o e=o 

Taking the limit of the first, resp. second, sum in the last member of inequality 
(12.6) yields the probability distribution functions of a Poisson distribution 
with parameter A, resp. /t, evaluated at a, resp. j3. 

On the other hand, since H{n,r — k,s — £,i) < 1 (k > a or £ > /3) and 
H{n,r — k,s — £,i) < H{n, r, s, i) (else), we get 

a 0 

P{n, r, s, i) < H{n, r, s, i)fs{n, r, s, i, a, (3) EE Q{n,r, k)Q{n,s,£) 

k=0 £=0 

r—i 0 

fc^a+1 

O' s — i 

+ '^Q{n,r,k) ^ Q{n,s,£) 
k^O 1^0+1 

r—i s—i 

+ E Q(n,r,k) ^ Q{n,s,£). 
k=a+l e=!3+l 



On the right-hand side, again fs{n,r, s,i,a, /3) < 1, whereas the sums tend 
to the corresponding Poisson distribution functions F,, resp. to 1 — FI, more 
precisely: 



limP(n, r, s, t) < \\mH{n,r, s,i)F\{a)F^{(3) 

+(1 - F^{a))F^{P) + Fx{a){l - F^(/3)) 
+(l-Pv(a))(l-F^(/3)). (12.7) 

Now letting tend a,(3 ^ oo yields Theorem 12.1. 



12.3 The Meet-in-the-Middle Attack 

Here, we consider the so-called Rabin scheme, which is given as follows: Let 
the plaintexts x consist of n blocks X(i),X( 2 ), . . . ,X(n) € Similarly, the 
ciphertext y will be written as 
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V=- ■ ■ ■ ,y{n)) G 

Denote by Ek{x) the DES encrypting of the plaintext x with key k and Dk{y) 
the DES deciphering of the ciphertext y with key k (for the description of 
DES see any general manual on modern cryptology). Then the hash values 
hi,h 2 , ■ ■ ■ ,hn are defined as 

hj '.= (1 ^ J ^ 

where Hq is some uniformly distributed random element of fB®®. We will write 

Ex{h) := Ex^^s^ (Ex^^_is^ (. . . (Ex^^s^ (h )) ...)), 

Dy(h) := (• ■ • i^y(n) (^)) •■•))• 

Now for mounting the so-called meet-in-the-middle attack, Eve first generates 

232 

messages X[g] and X[r], and calculates their values h((= Ex^i^(ho)) and 
hr{= (where denotes the number of 56-bit blocks of Now 

she sorts the lists of all values h(, resp. of all values hr, (recall that sorting 
is ’’fast” in the sense that sorting a list of n elements requires 0(n log n) 
operations). If one supposes that E encrypts ’’randomly enough”, this can 
be considered (before ordering) as two random samples of 2®^ drawings with 
replacements of a total population of 2®^ elements. So by Theorem 12.1, 
during the sorting, a coincidence (i.e., a case where there exist some £o,xo 
such that hig = hr„) occurs with probability at least about 1 — e“^. Now put 

X := (X[^g] , X[ro])- 



Then we get 

Ex{ho) = Ex^i^^{he^) = (ft.ro) = ^"ro- 

So ftp and ft„,,^ are, as one says, ’’linked up” (or ’’joined up”) by x. Hence 
Eve can construct a fraudulent message x' . 




13 Quantum Cryptography 



In this final short chapter, we will present the fundamental idea of quan- 
tum cryptography. This is not the same thing as quantum computing treated 
in Chapter 3: There, quantum computers are used to cryptanalyze classical 
cryptosystems. 

The most fundamental method of quantum cryptography can be demon- 
strated by the following example: 

- Alice sends Bob a string of photon pulses. She polarizes every photon (ran- 
domly) in one of 4 possible directions: horizontal, vertical, left-diagonal, or 
right-diagonal, for example. 



II/--I-/ 

- Bob is in possession of a polarization detector, which he can set to mea- 
sure the rectilinear or diagonal polarization. For this, he can, for example, 
use a calcium carbonate crystal. Since in this material electrons are bound 
with different strengths in different directions, a photon passing through 
the crystal “feels” a different electromagnetic force depending on the ori- 
entation of the electric field relative to the polarization axis in the crystal. 
Bob can not measure both types of polarization, since in quantum mechan- 
ics measuring the one destroys the possibility of measuring the other (see 
also Chapter 3 for more details). If he sets his detector to measure recti- 
linear polarization and if Alice polarized her photon really as “horizontal” 
(— ) or “vertical” (|), then Bob will learn how Alice polarized her photon. 
The same is true if Alice polarized as “\” or “/” and Bob measures diag- 
onal polarization. However, if he sets the detector to measure rectilinear 
polarization and if Alice polarized diagonally. Bob will obtain a random 
measurement and, what is more, he will not know the difference. So Bob 
will set his detector at random, e.g., 

drrdddrdrr 

(where “r” means “rectilinear” and “d” stands for “diagonal”). In our 
example, he could, for example, obtain the result 

/I-/-I 
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- Bob tells Alice (over an insecure channel) his detector settings. 

- Alice tells Bob which settings (rectilinear or diagonal) were correct. Here, 
for example, the detector was correctly set for the photon pulses numbers 
2, 6, 7, and 9. 

- Alice and Bob keep only those polarizations that were correctly measured, 
so here 

^1 ^ ^ ^ ^ ^ 

These correctly measured polarizations can be used as a message (or a key) 
in the form of a bitstring by a prearranged code. 

Since Bob will guess correctly in half of the cases, in order to generate n bits 
one has to use about 2n photon pulses. The important feature of quantum 
cryptography is that Eve can really not eavesdrop. Like Bob, she has to guess 
which type of polarization (rectilinear or diagonal) she has to measure, and 
she will be wrong in half of the cases. But then the polarization of the photons 
is changed, and Alice and Bob after comparing their bitstrings at the end, will 
find discrepancies, which shows them that there has been an attack by Eve. 
So they will just not use these bits and create new ones. By doing enough 
comparisons, they can get arbitrarily good security against an eavesdropping 
by Eve. 

For more precise and further information on quantum cryptography, see, for 
example, Clearwater, Williams (1998), Chapter 8 or Hungerbiihler, Struwe 
(2003). 
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a generalization of certain of O’Gonnor’s (1995) results for bitstrings to se- 
quences of elements of an arbitrary residue ring. Glosely related is also the 
paper Hawkes, O’Gonnor (1999). 

Ghapter 10 is about the notion of semantic security (Goldreich (1993)). 

The algorithmic complexity (or Turing-Kolmogorov-Ghaitin complexity) dis- 
cussed in Ghapter 11 is of rather theoretical interest (see Beth, Dai (1990)). 
Ghapter 12 addresses consequences of the well-known ’’birthday” paradox in 
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probability theory for cryptology (especially hash functions). In many cryp- 
tology texts, one can find the keyword ’’birthday attack”. In particular, Sec- 
tions 12.2 and 12.3 are based on Campana et al. (1988). 

Finally, Chapter 13 is an informal standard short introduction to quantum 
cryptography. A more sophisticated treatment of it can, e.g., be found in 
Clearwater, Williams (1998). See also Hungerbiihler, Struwe (2003). 
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